REGEX for svndumptool

2020-07-17 06:38发布

I have a large (30+GB) legacy SVN repo with a lot of externals defined that needs to be cloned to a new server. As the repo was originally created in the pre SVN v1.5 days it has a lot of externals defined with absolute paths that refer back to the old server name. I want to remove all the absolute paths and make them relative so that the migration will work.

I found svndumptool via this question, it works great on some of the externals but I haven't been able to figure out a REGEX that will work for the rest of the cases.

Here are cases of the six different types of external definitions that I found in the repo by running the command: svn propget --recursive svn:externals %REPODIR_FILE%/%REPO%

CaseA https://svn.acme.com/svn/test/branches/project.x
CaseB -r 19 https://svn.acme.com/svn/test/branches/project.y
https://svn.acme.com/svn/test/branches/project.z CaseC
-r 20 https://svn.acme.com/svn/test/branches/project.z@20 CaseD
CaseE  https://svn.acme.com/svn/test/branches/project.x CaseE
CaseF -r21  https://svn.acme.com/svn/test/branches/project.y

Note that CaseE is the same as CaseA except for the double spacing before the https.

Note that CaseF is almost the same as CaseB except for the space between the -r and the tag number and the double spacing before the https.

I'm using rubular.com to test my REGEX, currently I'm using the following expression:

^(\S+) (|-r ?\d* ?)https:\/\/svn.acme.com(\S+)

Which gives me:

Match 1
1.  CaseA
2.   
3.  /svn/test/branches/project.x
Match 2
1.  CaseB
2.  -r 19
3.  /svn/test/branches/project.y

I haven't been able to come up with a REGEX that would parse cases C and D into something like the following:

Match 3
1.  /svn/test/branches/project.z
2.  
3.  CaseC
Match 4
1.  -r 20
2.  /svn/test/branches/project.z@20
3.  CaseD

svndumptool does seem to require that I split out the different components of the external definition so that it can correctly reassemble it in the correct (SVN v1.5) syntax.

Any help from the REGEX gods would be much appreciated :-)

3条回答
萌系小妹纸
2楼-- · 2020-07-17 07:16

In case someone using Python ends up in here:

import re

test_externals ="""
CaseA https://svn.acme.com/svn/test/branches/project.x
CaseB -r 19 https://svn.acme.com/svn/test/branches/project.y
https://svn.acme.com/svn/test/branches/proje_9ct.z/123 CaseC1
https://svn.acme.com/svn/test/branches/proje_9ct.z/123   CaseC2
https://svn.acme.com/svn/test/branches/proje_9ct.z/123    CaseC3
https://svn.acme.com/svn/test/branches/project.zCaseC4
-r 20 https://svn.acme.com/svn/test/branches/project.z@20 CaseD1
-r27 https://svn.acme.com/svn/test/branches/project.z@27 CaseD2
-r37 https://svn.acme.com/svn/test/branches/project.z CaseD3
https://svn.acme.com/svn/test/branches/project.z@88 CaseD4
 -r 20 https://svn.acme.com/svn/test/branches/project.z@20 CaseD1
CaseE -r21  https://svn.acme.com/svn/test/branches/project.y
"""

pat_url    = r'(?P<url>https?://(?:[a-zA-Z0-9\._-]+)(?:[a-zA-Z0-9\._-/]+))'
pat_folder = r'(?P<folder>[a-zA-Z0-9/\.-_]+)'
pat_pegrev = r'(?:@(?P<peg_revision>\d+))'
pat_oprev  = r'(?:-r\s?(?P<op_rev>\d+))'

regex_externals = {
    'CaseA': re.compile(r'^\s*{folder}\s{url}$'.format(folder=pat_folder, url=pat_url)),
    'CaseB': re.compile(r'^\s*{folder}\s{oprev}\s{url}$'.format(folder=pat_folder, oprev=pat_oprev, url=pat_url)),
    'CaseC': re.compile(r'^\s*{url}\s{folder}$'.format(folder=pat_folder, url=pat_url)),
    'CaseD': re.compile(r'^\s*{oprev}?\s{url}{pegrev}?\s*{folder}$'.format(folder=pat_folder, oprev=pat_oprev, pegrev=pat_pegrev, url=pat_url)),
}

for r in regex_externals: print('%s: %s' %(r, regex_externals[r].pattern))


for case in test_externals.split('\n'):
for pat in regex_externals:
    match = re.search(regex_externals[pat], case)
    if match:
        print('\n\n%s: %s' %(pat, case))
        for g in match.groups():
            print '\t%s' % g
查看更多
看我几分像从前
3楼-- · 2020-07-17 07:18

Here is the set of commands that I have found work for me, hopefully this helps someone trying to fix a borked SVN repo in the future. Remember friends don't let friends use absolute externals!

This procedure reduced the list of externals from over 30K defined externals to just 30 defined externals in the first six iterations.

:: List of types of externals we need to deal with
CaseA https://svn.acme.com/svn/test/branches/project.x
CaseB -r 19 https://svn.acme.com/svn/test/branches/project.y
https://svn.acme.com/svn/test/branches/project.z CaseC
-r 20 https://svn.acme.com/svn/test/branches/project.z@20 CaseD
CaseE  https://svn.acme.com/svn/test/branches/project.x
CaseF -r21  https://svn.acme.com/svn/test/branches/project.y

:: SVN Dump Tool
SET SVNDUMPTOOL=C:\support\svndumptool\v0.6.1\svndumptool.exe
SET REPODIR=D:\Repositories
SET REPODIR_FILE=file:///D:/Repositories
SET DUMPDIR=D:\Dumps
SET REPO=test
SET SVN="C:\Program Files (x86)\VisualSVN Server\bin\svn.exe"
SET SVNADMIN="C:\Program Files (x86)\VisualSVN Server\bin\svnadmin.exe"
SET CREATE=%SVNADMIN% create
SET LOAD=%SVNADMIN% load --ignore-uuid
SET DUMP=%SVNADMIN% dump

:: Get a list of the externals in the original repo
svn propget --recursive svn:externals %REPODIR_FILE%/%REPO%>%DUMPDIR%\%REPO%.externals

:: Dump the repo
%DUMP% %REPODIR%\%REPO% > %DUMPDIR%\%REPO%.dump

:: Transform the repo
:: CaseA
%SVNDUMPTOOL% transform-prop svn:externals "^(\S+) https://svn.acme.com(\S+)" "\2 \1" %DUMPDIR%\%REPO%.dump %DUMPDIR%\%REPO%_A.dump
:: Delete the dump to save disk space, each dump file iteration is ~300GB
DEL %DUMPDIR%\%REPO%.dump
:: CaseB
%SVNDUMPTOOL% transform-prop svn:externals "^(\S+) (-r ?\d* ?)https://svn.acme.com(\S+)" "\2\3 \1" %DUMPDIR%\%REPO%_A.dump %DUMPDIR%\%REPO%_AB.dump
DEL %DUMPDIR%\%REPO%_A.dump
:: CaseC
%SVNDUMPTOOL% transform-prop svn:externals "^(\S*)https://svn.acme.com(\S*)" "\2\1" %DUMPDIR%\%REPO%_AB.dump %DUMPDIR%\%REPO%_ABC.dump
DEL %DUMPDIR%\%REPO%_AB.dump
:: CaseD
%SVNDUMPTOOL% transform-prop svn:externals "^(-r ?\d* ?)(\S+) https://svn.acme.com(\S+)" "\1\2 \3" %DUMPDIR%\%REPO%_ABC.dump %DUMPDIR%\%REPO%_ABCD.dump
DEL %DUMPDIR%\%REPO%_ABC.dump
:: CaseE
%SVNDUMPTOOL% transform-prop svn:externals "^(\S+)  https://svn.acme.com(\S+)" "\2 \1" %DUMPDIR%\%REPO%_ABCD.dump %DUMPDIR%\%REPO%_ABCDE.dump
DEL %DUMPDIR%\%REPO%_ABCD.dump
:: CaseF
%SVNDUMPTOOL% transform-prop svn:externals "^(\S+) (-r ?\d* ?)  https://svn.acme.com(\S+)" "\2\3 \1" %DUMPDIR%\%REPO%_ABCDE.dump %DUMPDIR%\%REPO%_ABCDEF.dump
DEL %DUMPDIR%\%REPO%_ABCDE.dump

:: Delete the old repo
RMDIR /Q /S %REPODIR%\%REPO%
:: Create a new clean repo
%CREATE% %REPODIR%\%REPO%
:: Load the fixed dump
%LOAD% %REPODIR%\%REPO% < %DUMPDIR%\%REPO%_ABCDEF.dump
:: Get the new list of externals
%SVN% propget --recursive svn:externals %REPODIR_FILE%/%REPO%>%DUMPDIR%\%REPO%_ABCDEF.externals
查看更多
叛逆
4楼-- · 2020-07-17 07:30

Here are two choices since you're using Ruby. However, do you have any other regular expression flavor on your machine ?

1st Choice (Absolute path AND 3 matches)

^(-r ?\d*|(?:https:\/\/svn.acme.com)?\S+|\S+) (-r ?\d*|\S+)(?: (\S+))?$

Demo

http://rubular.com/r/dBMVd1arVJ


2nd Choice (Relative path AND multiple matches)

^(\S+) (?:https:\/\/svn\.acme\.com)(.+)|(\S+) (-r ?\d+) (?:https:\/\/svn\.acme\.com)(.+)|(?:(-r ?\d+) )?(?:https:\/\/svn\.acme\.com)(.+) (\S+)

Demo

http://rubular.com/r/f3t3OH5Wqn

查看更多
登录 后发表回答