How do I fix these Git GC problems?

2019-05-09 06:32发布

问题:

I have a recurring issue where my git repo (I think?) will decide it needs to garbage collect. This process takes well over a half hour, and will then trigger on every pull/push operation.

Running Git GC manually takes a half hour, but doesn't seem to fix the issue. The only solution I have found is to delete my repo and clone fresh, which is suboptimal for any number of reasons.

My git GC operations may be slow because I have set git some memory limits to stop it from crashing out on git GC operations, as it used to do when it hit the 4gb windows memory limit and then crapped out.

Any help would be appreciated. It is a large repo, the repo does contain a significant amount of binary data, as well as a large number of very sizeable (>500k) text files.

So, 1. How do I limit the amount Git decides to garbage collect. 2. How do I speed up the GC operation? 3. What can I do to solve or minimize the greater issues involved (aka, why it has to garbage collect in the first place)?

回答1:

The only real way around it is to reduce the size of your repository. You can disable automatic garbage collection with git config --global gc.auto 0, but that will increase your network traffic on pushes and pulls, if they even still work at all, and will increase your local disk space used for git. Without git gc, your local repo will contain a full copy of every revision of every file you change. However, that might be feasible if you do something like run git gc every night while you are gone.

I'd recommend looking into something like git annex, which was designed for situations like yours. It basically stores a pointer to large files in your repo rather than the files themselves.



回答2:

Note: a commit from Git 2.17 (Q2 2018) does point out to the existence of a hook, that git gc --auto will call, and which can be helpful to minimize the effects of that command.

You can read more about git gc --auto in "Understanding git gc --auto".

A sample auto-gc hook (in contrib/) to skip auto-gc while on battery has been updated to almost always allow running auto-gc unless on_ac_power command is absolutely sure that we are on battery power (earlier, it skipped unless the command is sure that we are on ac power).

See commit 781262c (28 Feb 2018) by Adam Borowski (kilobyte).
(Merged by Junio C Hamano -- gitster -- in commit b423234, 14 Mar 2018)

hooks/pre-auto-gc-battery: allow gc to run on non-laptops

Desktops and servers tend to have no power sensor, thus on_ac_power returns 255 ("unknown"). Thus, let's take any answer other than 1 ("battery") as no contraindication to run gc.

If that tool returns "unknown", there's no point in querying other sources as it already queried them, and is smarter than us (can handle multiple adapters).

So, depending on your case, setting up that hook can help you having a say as to wether git gc --auto should execute or not.



标签: git git-gc