Python的：更换标签，但保留内部文本V2(Python: Replace tags but pr

我有一个脚本来执行搜索和替换。它是基于一个脚本在这里。它被修改，以接受文件作为输入，但它似乎并没有认识到正则表达式良好。

剧本：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, os
import re
import glob

_replacements = {
    '[B]': '**',
    '[/B]': '**',
    '[I]': '//',
    '[/I]': '//',

}

def _do_replace(match):
    return _replacements.get(match.group(0))

def replace_tags(text, _re=re.compile('|'.join((r) for r in _replacements))): 
    return _re.sub(_do_replace, text)

def getfilecont(FN):
    if not glob.glob(FN): return -1 # No such file
    text = open(FN, 'rt').read()
    text = replace_tags(text, re.compile('|'.join(re.escape(r) for r in _replacements)))
    return replace_tags(text)

scriptName = os.path.basename(sys.argv[0])
if sys.argv[1:]:
    srcfile = glob.glob(sys.argv[1])[0]
else:
    print """%s: Error you must specify file, to convert forum tages to wiki tags!
            Type %s FILENAME """ % (scriptName, scriptName)
    exit(1)
dstfile = os.path.join('.' , os.path.basename(srcfile)+'_wiki.txt')
converted = getfilecont(srcfile)
try:
    open(dstfile, 'wt+').write(converted)
    print 'Done.'
except:
    print 'Error saving file %s' % dstfile

print converted
#print replace_tags("This is an [[example]] sentence. It is [[{{awesome}}]].")

我要的是更换

'[B]': '**',
'[/B]': '**',

只有这样一行在正则表达式

\[B\](.*?)\[\/B\] : **\1**

就在那天将与这样的UBB标签有所帮助：

[FONT=Arial]Hello, how are you?[/FONT]

那么我可以用这样的事情

\[FONT=(.*?)\](.*?)\[\/FONT\] : ''\2''

但我不能似乎能够做到这一点与此脚本。有另一种方法做正则表达式搜索和在这个脚本的原始来源更换，但在同一时间使用应用re.sub适用于一个标签。这个剧本，因为我想这样我就可以在以后更新它，我可以添加尽可能多线的其它优势。

Answer 1:

对于初学者来说，你逃避在这条线的模式：

text = replace_tags(text, re.compile('|'.join(re.escape(r) for r in _replacements)))

re.escape将一个字符串逃脱它以这样一种方式，如果新的字符串被用来作为一个正则表达式，将精确匹配输入字符串。

卸下re.escape将不能完全解决你的问题，但是，ANS你找到只是做在这条线上你的字典匹配文本的查找替换：

return _replacements.get(match.group(0))

为了解决这个问题，你可以让每个图案变成自己的捕获组：

text = replace_tags(text, re.compile('|'.join('(%s)' % r for r in _replacements)))

您还需要了解哪些模式与替代去。这样的事情可能工作：

_replacements_dict = {
    '[B]': '**',
    '[/B]': '**',
    '[I]': '//',
    '[/I]': '//',
}
_replacements, _subs = zip(*_replacements_dict.items())

def _do_replace(match):
    for i, group in m.groups():
        if group:
            return _subs[i]

注意这改变_replacements到模式的列表，并创建一个并行阵列_subs的实际替代品。（我会叫他们正则表达式和替换，但不希望有重新编辑“_replacements”的每一次出现）。

Answer 2:

有人做了它在这里。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, os
import re
import glob

_replacements_dict = {
    '\[B\]': '**',
    '\[\/B\]': '**',
    '\[I\]': '//',
    '\[\/I\]': '//',
    '\[IMG\]' : '{{',
    '\[\/IMG\]' : '}}',
    '\[URL=(.*?)\]\s*(.*?)\s*\[\/URL\]' : r'[[\1|\2]]',
    '\[URL\]\s*(.*?)\s*\[\/URL\]' : r'[[\1]]',
    '\[FONT=(.*?)\]' : '',
    '\[color=(.*?)\]' : '',
    '\[SIZE=(.*?)\]' : '',
    '\[CENTER]' : '',
    '\[\/CENTER]' : '',
    '\[\/FONT\]' : '',
    '\[\/color\]' : '',
    '\[\/size\]' : '',
}
_replacements, _subs = zip(*_replacements_dict.items())

def replace_tags(text):
    for i, _s in enumerate(_replacements):
        tag_re = re.compile(r''+_s,  re.I) 
        text, n = tag_re.subn(r''+_subs[i], text)
    return text


def getfilecont(FN):
    if not glob.glob(FN): return -1 # No such file
    text = open(FN, 'rt').read()
    return replace_tags(text)

scriptName = os.path.basename(sys.argv[0])
if sys.argv[1:]:
    srcfile = glob.glob(sys.argv[1])[0]
else:
    print """%s: Error you must specify file, to convert forum tages to wiki tags!
            Type %s FILENAME """ % (scriptName, scriptName)
    exit(1)
dstfile = os.path.join('.' , os.path.basename(srcfile)+'_wiki.txt')
converted = getfilecont(srcfile)
try:
    open(dstfile, 'wt+').write(converted)
    print 'Done.'
except:
    print 'Error saving file %s' % dstfile

#print converted
#print replace_tags("This is an [[example]] sentence. It is [[{{awesome}}]].")

http://pastie.org/1447448

文章来源: Python: Replace tags but preserve inner text V2