We are splitting a big svn repository (100k+ revs) into several smaller repos. I am using svndumpfilter (v1.7.2) to split the dump and svndumptool/sed to filter the big dump.
Everything works fine, except that there are still some "padding revisions" in my filtered dump, even though I used the option "drop-empty-revs".
This is not too problematic when we have less than 10% of useless "padding revisions" but sometimes, the new repo has only a few hundreds real revisions that are buried in 30k+ "padding revisions".
Here is the command I use and the revisions that are included
svndumpfilter --drop-empty-revs --renumber-revs include /MyProj < MassiveOldRepo.dump > NewAllCleanRepo.dump
------------------------------------------------------------------------
r3453 | (no author) | 2005-09-29 17:27:54 +0200 (jeu., 29 sept. 2005) | 1 line
This is an empty revision for padding.
------------------------------------------------------------------------
r3454 | (no author) | 2005-09-29 17:28:27 +0200 (jeu., 29 sept. 2005) | 1 line
This is an empty revision for padding.
------------------------------------------------------------------------
I would like to know if there is a way not to include these revisions as I am filtering the dump (without manually removing them from the filtered dump).
EDIT: I would add that my use of svndumpfilter
drops some empty revisions, the ones before the first "real" revision and the ones after the last "real" revision.
I had the same problem with empty revisions being already included in the repository. Since Subversion 1.7 there is a still undocumented switch which allows to filter all empty revisions.
More information can be found at grokbase.
After hours of tests, reading the svndumpfilter source code (which is very well commented, well done!), I realize that these empty revisions don't come from my filtering.
They are already lying in my original dump and date from 2005.
Conclusion : Check your data first!!
I can't be sure about your situation, but in my case, it WAS the filtering that caused thousands of those padding messages to show up in the log. I resolved it by including these two switches:
The second switch makes it so that if your filter, for example, included rev 1000-1200, but then excluded 1201-5000, the next rev it includes will be numbered 1201, not 5001 which would cause the creation of few thousand empty padding revs.