I checked a load of files in to a branch and merged and then had to remove them and now I'm left with a large .pack file that I don't know how to get rid of.
I deleted all the files using git rm -rf xxxxxx
and I also ran the --cached
option as well.
Can someone tell me how I can remove a large .pack file that is currently in the following directory:
.git/objects/pack/pack-xxxxxxxxxxxxxxxxx.pack
Do I just need to remove the branch that I still have but am no longer using? Or is there something else I need to run?
I'm not sure how much difference it makes but it shows a padlock against the file.
Thanks
EDIT
Here are some excerpts from my bash_history that should give an idea how I managed to get into this state (assume at this point I'm working on a git branch called 'my-branch' and I've got a folder containing more folders/files):
git add .
git commit -m "Adding my branch changes to master"
git checkout master
git merge my-branch
git rm -rf unwanted_folder/
rm -rf unwanted_folder/ (not sure why I ran this as well but I did)
I thought I also ran the following but it doesn't appear in the bash_history with the others :
git rm -rf --cached unwanted_folder/
I also thought I ran some git commands (like git gc
) to try to tidy up the pack file but they don't appear in the .bash_history file either.
The issue is that, even though you removed the files, they are still present in previous revisions. That's the whole point of git, is that even if you delete something, you can still get it back by accessing the history.
What you are looking to do is called rewriting history, and it involved the
git filter-branch
command.GitHub has a good explanation of the issue on their site. https://help.github.com/articles/remove-sensitive-data
To answer your question more directly, what you basically need to run is this command with
unwanted_folename_or_folder
replaced accordingly:This will remove all references to the files from the active history of the repo.
Next, to peform a GC cycle to force all references to the file to be expired and purged from the packfile. Nothing needs to be replaced in these commands.
I am a little late for the show but in case the above answer didn't solve the query then I found another way. Simply remove the specific large file from .pack. I had this issue where I checked in a large 2GB file accidentally. I followed the steps explained in this link: http://www.ducea.com/2012/02/07/howto-completely-remove-a-file-from-git-history/
As loganfsmyth already stated in his answer, you need to purge git history because the files continue to exist there even after deleting them from the repo. Official GitHub docs recommend BFG which I find easier to use than
filter-branch
:Deleting files from history
Download BFG from their website. Make sure you have java installed, then create a mirror clone and purge history. Make sure to replace
YOUR_FILE_NAME
with the name of the file you'd like to delete:Delete a folder
Same as above but use
--delete-folders
Other options
BFG also allows for even fancier options (see docs) like these:
Remove all files bigger than 100M from history:
Important!
When running BFG, be careful that both
YOUR_FILE_NAME
andYOUR_FOLDER_NAME
are indeed just file/folder names. They're not paths, so something likefoo/bar.jpg
will not work! Instead all files/folders with the specified name will be removed from repo history, no matter which path or branch they existed.One option:
run
git gc
manually to condense a number of pack files into one or a few pack files. This operation is persistent (i.e. the large pack file will retain its compression behavior) so it may be beneficial to compress a repository periodically withgit gc --aggressive
Another option is to save the code and .git somewhere and then delete the .git and start again using this existing code, creating a new git repository (
git init
).Scenario A: If your large files were only added to a branch, you don't need to run
git filter-branch
. You just need to delete the branch and run garbage collection:Scenario B: However, it looks like based on your bash history, that you did merge the changes into master. If you haven't shared the changes with anyone (no
git push
yet). The easiest thing would be to reset master back to before the merge with the branch that had the big files. This will eliminate all commits from your branch and all commits made to master after the merge. So you might lose changes -- in addition to the big files -- that you may have actually wanted:Then run the steps from the scenario A.
Scenario C: If there were other changes from the branch or changes on master after the merge that you want to keep, it would be best to rebase master and selectively include commits that you want:
In your editor, remove lines that correspond to the commits that added the large files, but leave everything else as is. Save and quit. Your master branch should only contain what you want, and no large files. Note that
git rebase
without-p
will eliminate merge commits, so you'll be left with a linear history for master after<commit hash>
. This is probably okay for you, but if not, you could try with-p
, butgit help rebase
sayscombining -p with the -i option explicitly is generally not a good idea unless you know what you are doing
.Then run the commands from scenario A.