svndumpfilter --drop-empty-revs keeps padding revi

2019-04-28 13:16发布

问题:

We are splitting a big svn repository (100k+ revs) into several smaller repos. I am using svndumpfilter (v1.7.2) to split the dump and svndumptool/sed to filter the big dump.

Everything works fine, except that there are still some "padding revisions" in my filtered dump, even though I used the option "drop-empty-revs".

This is not too problematic when we have less than 10% of useless "padding revisions" but sometimes, the new repo has only a few hundreds real revisions that are buried in 30k+ "padding revisions".

Here is the command I use and the revisions that are included

svndumpfilter --drop-empty-revs --renumber-revs include /MyProj < MassiveOldRepo.dump > NewAllCleanRepo.dump

------------------------------------------------------------------------
r3453 | (no author) | 2005-09-29 17:27:54 +0200 (jeu., 29 sept. 2005) | 1 line

This is an empty revision for padding.
------------------------------------------------------------------------
r3454 | (no author) | 2005-09-29 17:28:27 +0200 (jeu., 29 sept. 2005) | 1 line

This is an empty revision for padding.
------------------------------------------------------------------------    

I would like to know if there is a way not to include these revisions as I am filtering the dump (without manually removing them from the filtered dump).

EDIT: I would add that my use of svndumpfilter drops some empty revisions, the ones before the first "real" revision and the ones after the last "real" revision.

回答1:

I had the same problem with empty revisions being already included in the repository. Since Subversion 1.7 there is a still undocumented switch which allows to filter all empty revisions.

svndumpfilter --drop-all-empty-revs include / < oldrepos.dump > newrepos.dump

More information can be found at grokbase.



回答2:

After hours of tests, reading the svndumpfilter source code (which is very well commented, well done!), I realize that these empty revisions don't come from my filtering.

They are already lying in my original dump and date from 2005.

Conclusion : Check your data first!!



回答3:

I can't be sure about your situation, but in my case, it WAS the filtering that caused thousands of those padding messages to show up in the log. I resolved it by including these two switches:

--drop-empty-revs --renumber-revs

The second switch makes it so that if your filter, for example, included rev 1000-1200, but then excluded 1201-5000, the next rev it includes will be numbered 1201, not 5001 which would cause the creation of few thousand empty padding revs.