Objects folder in .git is extremely large for my s

2019-07-04 05:06发布

My git push was very slow so I investigated and found out that the folder .git/objects takes up ~450MB.

The complete project is only ~6MB, but I've added archives which were 140MB large. As github doesn't allow files that large, I've removed them, then did git add -A and tried to commit again, but it takes a very long time and seems to upload a lot.

It takes forever at:

writing objects:  97% (35/36), 210.17 MiB | 49.00 KiB/s 

How do I fix my git repository?

1条回答
仙女界的扛把子
2楼-- · 2019-07-04 05:44

Commits are "forever"

Remember that Git saves every commit "forever" (not to worry, there is a reason this is in quotes!).

This means that when you added the archive and committed, you put it in, and then when you removed the archive and committed, you put in an additional commit that says "now that you've saved this huge archive forever, take it out of the working version because we don't need it" ... but it's still in the permanent audit trail, the "every commit forever" record. So it's in your repository and you'll "have to" push it. Again, note the quotes around "have to".

Forever is only as long as you (and everyone else with a copy) want it to be

See How to remove/delete a large file from commit history in Git repository? Your question is basically a duplicate, but before you go to the answers there, note that you haven't yet successfully pushed these commits, because GitHub defaults to saying "No, that's really huge." This means you can use rewriting operations freely: no one else has a copy of your commits yet.

For cases like this one, I generally think git rebase -i is the easiest recipe for cleaning up. In particular, suppose you have this sequence of commits in your git rebase -i edit recipe:

pick a123456 add feature foo
pick b123456 rm giant file accidentally added in a123456
pick ...

In this particular case, the mistake ("add giant file") and fix ("remove giant file") are right next to each other. Suppose you could tell Git: "Just combine the two commits into one commit, that does what would happen if you did the first commit and then the second one." That is, let's do everything including adding the giant file, but then before committing, let's also do the remove-giant-file, and only then commit.

Well, "squash" and "fixup" are the two commands that tell Git exactly that. Just change pick to either squash or fixup. The only difference between these is whether you get a chance to edit the commit message for the new combined commit:

  • squash: make a combined commit, and bring up the editor on the log message, which initially contains both of the original log messages.
  • fixup: make a combined commit, but use the log message from the non-fixup commit, discarding the fixup's message entirely.

All history-edit operations copy commits

As we noted at the top, commits are forever. They can't be changed. What rebase does (and the "BFG" mentioned in the linked question's answers too) is copy the bad commits to new, slightly different, better-(we-hope), commits. And then after the copy is done, we have Git shove aside the bad commits and make the branch-name point to the new copies:

A--B--C--D--E   <-- master

Oops, commit C was bad and we made D to remove the big file, and then we made unrelated fix E as well. So now we copy commits C-through-E (we have to copy E because we have to copy everything from the bad point forward) to newer, better commits:

     C--D--E   [abandoned]
    /
A--B--C'--E'   <-- master

We shove the original C-D-E chain out of the way and use our new C' (copy of C that has D squashed or fixup-ed into it) and E' (copy of E) and make the branch name, master in this case, point to E' instead of to E. Now we can push, or maybe force-push.

We'd have to force-push if we had already managed to push successfully. If we have to force-push, that means someone else might have already snagged our bad C-D-E chain and might be using it and they will have to recover too. If they do it wrong, they may even bring C-D-E back! (Usually by merging.)

But if no one else has C-D-E, we can abandon them and know they'll never come back (unless we go search for them). So now we're free to (non-force) push the corrected C'-E' chain.

If your fix is not nicely in order

The above is great if your commit that fixes your bad commit comes right after the bad one:

pick a123456 add feature foo
pick b123456 rm giant file accidentally added in a123456
pick ...

but maybe the "rm giant file" commit does not come right after the bad commit, or maybe it has other things mixed in. If you can simply re-order the commits so that a lone "rm giant file" commit does come right after the mistake, that's easy, just follow the rebase -i instructions. You can do two rebases: one to reorder, and one to squash/fixup; or you can try the reorder-and-squash/fixup all in one go, if you prefer.

If not ... well, this is when you may want to go for filter-branch or the BFG mentioned in the other (linked) question: these can do complicated surgery on commits. Interactive rebase only does simple stuff on its own, and leaves complicated methods to you (you can use git commit --amend in the middle of the interactive rebase, or do multiple additional commits, for instance).

查看更多
登录 后发表回答