I have a recurring issue where my git repo (I think?) will decide it needs to garbage collect. This process takes well over a half hour, and will then trigger on every pull/push operation.
Running Git GC manually takes a half hour, but doesn't seem to fix the issue. The only solution I have found is to delete my repo and clone fresh, which is suboptimal for any number of reasons.
My git GC operations may be slow because I have set git some memory limits to stop it from crashing out on git GC operations, as it used to do when it hit the 4gb windows memory limit and then crapped out.
Any help would be appreciated. It is a large repo, the repo does contain a significant amount of binary data, as well as a large number of very sizeable (>500k) text files.
So,
1. How do I limit the amount Git decides to garbage collect.
2. How do I speed up the GC operation?
3. What can I do to solve or minimize the greater issues involved (aka, why it has to garbage collect in the first place)?
The only real way around it is to reduce the size of your repository. You can disable automatic garbage collection with git config --global gc.auto 0
, but that will increase your network traffic on pushes and pulls, if they even still work at all, and will increase your local disk space used for git. Without git gc
, your local repo will contain a full copy of every revision of every file you change. However, that might be feasible if you do something like run git gc
every night while you are gone.
I'd recommend looking into something like git annex, which was designed for situations like yours. It basically stores a pointer to large files in your repo rather than the files themselves.
Note: a commit from Git 2.17 (Q2 2018) does point out to the existence of a hook, that git gc --auto
will call, and which can be helpful to minimize the effects of that command.
You can read more about git gc --auto
in "Understanding git gc --auto
".
A sample auto-gc
hook (in contrib/
) to skip auto-gc
while on
battery has been updated to almost always allow running auto-gc
unless on_ac_power
command is absolutely sure that we are on
battery power (earlier, it skipped unless the command is sure that
we are on ac power).
See commit 781262c (28 Feb 2018) by Adam Borowski (kilobyte
).
(Merged by Junio C Hamano -- gitster
-- in commit b423234, 14 Mar 2018)
hooks/pre-auto-gc-battery
: allow gc
to run on non-laptops
Desktops and servers tend to have no power sensor, thus on_ac_power
returns
255 ("unknown"). Thus, let's take any answer other than 1 ("battery
") as
no contraindication to run gc
.
If that tool returns "unknown
", there's no point in querying other sources
as it already queried them, and is smarter than us (can handle multiple
adapters).
So, depending on your case, setting up that hook can help you having a say as to wether git gc --auto
should execute or not.