Nuking huge file in svn repository

2019-03-12 20:41发布

问题:

As the local subversion czar i explain to everyone to keep only source code and non-huge text files in the repository, not huge binary data files. Smaller binary files that are parts of tests, maybe.

Unfortunately i work with humans! Someone is likely to someday accidentally commit a 800MB binary hulk. This slows down repository operations.

Last time i checked, you can't delete a file from the repository; only make it not part of the latest revision. The repository keeps the monster for all eternity, in case anyone ever wants to recall the state of the repository for that date or revision number.

Is there a way to really delete that monster file and end up with a decent sized repository? I've tried the svnadmin dump/load thing but it was a pain.

回答1:

Some extra info about this can be found at the blog post: Subversion Obliterate, the missing feature

Be sure to read through the comments too, where Karl Fogel puts the article into perspective :-)



回答2:

To permanently delete monster files from a svn repository, there is no other solution than using svnadmin dump/load. (SVN Book: dump command)

To prevent huge files from being committed, a hook script can be used. You could have, for example, a script that ran "pre-commit" whenever someone tried to commit to the repository. The script might check filesize, or filetype, and reject the commit if it contained a file or files that were too large, or of a "forbidden" type.

More typical uses of hook scripts are to check (pre-commit) that a commit contains a log message, or (post-commit) to email details of the commit or to update a website with the newly committed files.

A hook script is a script that runs in response to response to repository events (SVN Book: Create hooks).



回答3:

If you can catch it as soon as it's committed, the svnadmin dump/load technique isn't too painful. Suppose someone just accidentally committed gormundous-raw-image.psd in Revision 3849. You could do this:

svnadmin dump /var/repos -r 1:3848 > ~/repos_dump

That would create a dump file containing everything up to and including Revision 3848. At that point, you could use svnadmin create and svnadmin load to reconstitute the repository without the offending commit, the caveat being that any changes you made within the repository's directory structure--hooks, symlinks, permission changes, auth files, etc.--would need to be copied over from the old directory. Here's an example of the rest of the bash session you might use to complete the operation:

svnadmin create /var/repos-new
svnadmin load /var/repos-new < ~/repos_dump
cp -r /var/repos/conf /var/repos-new
cp -r /var/repos/hooks /var/repos-new
mv /var/repos{,-old} && mv /var/repos-new /var/repos

I'm sure this will be more painful the more history your repository has, but it does work.



回答4:

Once you removed the file from your HEAD revision, it doesn't slow you down on operation speed as ony deltas between revisions are handled. (Repository backups must of course handle the load).