How to remove big (>100MB) file from a GitHub repo

2019-01-20 07:37发布

问题:

I am in the same situation as described here after having inadvertently adding a big file that I don't want and having done additional commits of other work (not knowing the push would fail) after inadvertently adding the big file:

Am I supposed to run BFG on the mirrored repo or the original?


ATTEMPT #1 Tried this to remove the file:

git rm bigfile
git commit bigfile
git push

No luck. The push was still stuck on trying to upload the big file even though the later commit deleted it:

$ git push

Username for 'https://github.com':
Password for 'https://traildreaming@github.com':
Counting objects: 210, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (66/66), done.
Writing objects: 100% (210/210), 5.72 MiB | 1.47 MiB/s, done.
Total 210 (delta 147), reused 203 (delta 140)
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.
remote: error: Trace: eedddea1fcb95663492e16c14fc3a250
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File doc/image.eps is 591.70 MB; this exceeds GitHub's file size limit of 100.00 MB
To https://github.com/traildreaming/myrepo.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'https://github.com/traildreaming/myrepo.git'

ATTEMPT #2 Tried the instructions for https://rtyley.github.io/bfg-repo-cleaner/

But it does not see my big files which are preventing me from doing a push:

$ git clone --mirror https://github.com/traildreaming/myrepo.git

Cloning into bare repository 'myrepo.git'...
Username for 'https://github.com':
Password for 'https://traildreaming@github.com':
remote: Counting objects: 20471, done.
remote: Total 20471 (delta 0), reused 0 (delta 0), pack-reused 20471
Receiving objects: 100% (20471/20471), 812.92 MiB | 4.00 MiB/s, done.
Resolving deltas: 100% (14464/14464), done.
Checking connectivity... done.

$ cp -fr myrepo.git myrepo.git.bac

note2@Travel-2015-11 /cygdrive/c/Users/note2/Data/git/tmpmirror
$ java -jar ../bfg-1.12.12.jar --strip-blobs-bigger-than 100M myrepo.git

Using repo : C:\Users\note2\Data\git\tmpmirror\myrepo.git

Scanning packfile for large blobs: 20471
Scanning packfile for large blobs completed in 103 ms.
Warning : no large blobs matching criteria found in packfiles - does the repo need to be packed?
Please specify tasks for The BFG :
bfg 1.12.12

ATTTEMPT #3 Trying this resulted in "remote: error:" messages:

$ git clone --mirror ../../myrepo/.git

Cloning into bare repository 'myrepo.git'...
done.

$ java -jar bfg-1.12.12.jar --strip-blobs-bigger-than 100M tmpmirror/myrepo/myrepo.git

Using repo : C:\Users\note2\Data\git\tmpmirror\myrepo\myrepo.git

Scanning packfile for large blobs: 12545
Scanning packfile for large blobs completed in 66 ms.
Found 1 blob ids for large blobs - biggest=620441479 smallest=620441479
Total size (unpacked)=620441479
Found 1322 objects to protect
Found 4 commit-pointing refs : HEAD, refs/heads/master, refs/remotes/origin/HEAD, refs/remotes/origin/master

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit b68c0cbc (protected by 'HEAD')

Cleaning
--------

Found 2769 commits
Cleaning commits:       100% (2769/2769)
Cleaning commits completed in 1,485 ms.

Updating 1 Ref
--------------

        Ref                 Before     After
        ---------------------------------------
        refs/heads/master | b68c0cbc | 49823acc

Updating references:    100% (1/1)
...Ref update completed in 18 ms.

Commit Tree-Dirt History
------------------------

        Earliest                                              Latest
        |                                                          |
        ...........................................................D

        D = dirty commits (file tree fixed)
        m = modified commits (commit message or parents changed)
        . = clean commits (no changes to file tree)

                                Before     After
        -------------------------------------------
        First modified commit | 0ef7f866 | e3d74aee
        Last dirty commit     | 338d2b46 | 01ca7b80

Deleted files
-------------

        Filename                     Git id
        ------------------------------------------------
        image.eps | e12fe50b (591.7 MB)


In total, 50 object ids were changed. Full details are logged here:

        C:\Users\note2\Data\git\tmpmirror\myrepo\myrepo.git.bfg-report\2016-06-11\15-59-30

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

$ git reflog expire --expire=now --all && git gc --prune=now --aggressive

Counting objects: 20681, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (20114/20114), done.
Writing objects: 100% (20681/20681), done.
Total 20681 (delta 14625), reused 3226 (delta 0)
Removing duplicate objects: 100% (256/256), done.

$ git push

Counting objects: 210, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (82/82), done.
Writing objects: 100% (210/210), 1.81 MiB | 0 bytes/s, done.
Total 210 (delta 147), reused 185 (delta 124)
remote: error: refusing to update checked out branch: refs/heads/master
remote: error: By default, updating the current branch in a non-bare repository
remote: error: is denied, because it will make the index and work tree inconsistent
remote: error: with what you pushed, and will require 'git reset --hard' to match
remote: error: the work tree to HEAD.
remote: error:
remote: error: You can set 'receive.denyCurrentBranch' configuration variable to
remote: error: 'ignore' or 'warn' in the remote repository to allow pushing into
remote: error: its current branch; however, this is not recommended unless you
remote: error: arranged to update its work tree to match what you pushed in some
remote: error: other way.
remote: error:
remote: error: To squelch this message and still keep the default behaviour, set
remote: error: 'receive.denyCurrentBranch' configuration variable to 'refuse'.
To /cygdrive/c/Users/note2/Data/git/tmpmirror/myrepo/../../myrepo/.git
 ! [remote rejected] master -> master (branch is currently checked out)
error: failed to push some refs to '/cygdrive/c/Users/note2/Data/git/tmpmirror/myrepo/../../myrepo/.git'

回答1:

Even though you have removed the file in the latest commit you still have a copy of it in your history. I think you're going to want to remove it from git completely.

You'll probably want to rebase it out. To find out when you introduced it you could do:

git log --reverse -n1 doc/image.eps

Then copy the SHA it gives you and do an interactive rebase:

git rebase -i sha~1

Keep the ~1 in the above command, but replace the sha with the actual SHA from the earlier command output. If the above command doesn't work you may need to set an EDITOR, e.g.:

EDITOR=vim git rebase -i sha~1

Replace vim with any command line editor you're comfortable with (emacs, nano, etc). You can get it to work with GUI editors like atom but you may need to pass in additional arguments to force the process to wait until you close the window. If you use atom you could run:

EDITOR="atom --wait" git rebase -i sha~1

This is going to take you way back in time. The very first line is going to have pick. You'll want to change that to an edit. Then save, and exit your editor. Do not change any other picks.

This will put you back at the commit that introduced the large file. You can now remove it from git:

git rm doc/image.eps && git commit --amend

Then continue the rebase:

git rebase --continue

If this goes all the way to completion, then you're done. You should be able to git push. However, if it doesn't, then you may have updated the image in a later commit. You'll want do the same git rm doc/image.eps && git commit --amend && git rebase --continue that we did above every time it stops.

I'm assuming quite a few things so I hope you're comfortable enough with git, editors, and the command line to use this information.

P.S. there is likely a much shorter and more succinct way to do this, but since you're asking this question I'm assuming you don't want a magical git command that will rip thru your history on its own. So first, let's try it step by step.



回答2:

Here is how I got it to work after the "git push" got stuck due to adding and committing a big file and then continuing committing with other work while away from the internet:

I downloaded bfg*jar from:
https://rtyley.github.io/bfg-repo-cleaner/

cd tmpmirror; mkdir myrepo; cd myrepo; git clone --mirror ../../myrepo/.git
java -jar bfg-1.12.12.jar --strip-blobs-bigger-than 100M myrepo.git
cd myrepo.git; git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push https://github.com/traildreaming/myrepo
cd ../../..
mv myrepo myrepo_old
git clone https://github.com/traildreaming/myrepo
cd myrepo

If you get this message, then try with the extra steps from below

$ java -jar ../../bfg-1.12.13.jar --strip-blobs-bigger-than 100M myrepo.git

Using repo : [DIR]\tmpmirror\myrepo\myrepo.git

Scanning packfile for large blobs: 20681
Scanning packfile for large blobs completed in 135 ms.
Warning : no large blobs matching criteria found in packfiles - does the         repo need to be packed?
Please specify tasks for The BFG :
bfg 1.12.13
Usage: bfg [options] [<repo>]

  -b <size> | --strip-blobs-bigger-than <size>
        strip blobs bigger than X (eg '128K', '1M', etc)

```

cd tmpmirror; mkdir myrepo; cd myrepo; git clone --mirror ../../myrepo/.git
cd myrepo.git; git repack; cd ..
java -jar bfg-1.12.12.jar --strip-blobs-bigger-than 100M myrepo.git
cd myrepo.git; git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push https://github.com/traildreaming/myrepo
cd ../../..
mv myrepo myrepo_old
git clone https://github.com/traildreaming/myrepo
cd myrepo

And then continue working in the newly cloned repo. Thanks to the advice at Am I supposed to run BFG on the mirrored repo or the original? to use "git push https://github.com/traildreaming/myrepo" and not "git push".