我需要一个程序,我在Python进行帮助。
假设我想每一个字的实例来替换"steak"
到"ghost"
(只是用它去......),但我也想替换的单词的每个实例"ghost"
到"steak"
在同一时间。 下面的代码无法正常工作:
s="The scary ghost ordered an expensive steak"
print s
s=s.replace("steak","ghost")
s=s.replace("ghost","steak")
print s
它打印: The scary steak ordered an expensive steak
我想要得到的是The scary steak ordered an expensive ghost
我可能会在这里使用正则表达式:
>>> import re
>>> s = "The scary ghost ordered an expensive steak"
>>> sub_dict = {'ghost':'steak','steak':'ghost'}
>>> regex = '|'.join(sub_dict)
>>> re.sub(regex, lambda m: sub_dict[m.group()], s)
'The scary steak ordered an expensive ghost'
或者,正如你可以复制/粘贴功能:
import re
def word_replace(replace_dict,s):
regex = '|'.join(replace_dict)
return re.sub(regex, lambda m: replace_dict[m.group()], s)
基本上,我创造,我想与其他词(替换词的映射sub_dict
)。 我可以创建该映射正则表达式。 在这种情况下,正则表达式是"steak|ghost"
(或"ghost|steak"
-顺序并不重要)和正则表达式引擎确实发现非重叠序列并相应地替换它们的工作的其余部分。
一些可能有用的修改
-
regex = '|'.join(map(re.escape,replace_dict))
-允许正则表达式来对他们有特殊的正则表达式语法(如括号)。 这转义特殊字符,使正则表达式匹配的文字文本。 -
regex = '|'.join(r'\b{0}\b'.format(x) for x in replace_dict)
-确保我们不匹配,如果我们的一个词是另一个词的子串。 换句话说,改变he
对she
却没有the
到tshe
。
通过目标的一个分割字符串,做替换,并把整个事情重新走到一起。
pieces = s.split('steak')
s = 'ghost'.join(piece.replace('ghost', 'steak') for piece in pieces)
这个作品完全一样.replace()
会,包括忽略单词边界。 所以它会变成"steak ghosts"
到"ghost steaks"
。
重命名的话,不文本出现温度值之一。 请注意,这不会是一个非常大的文本的最有效方式。 对于一个re.sub
可能更合适。
s="The scary ghost ordered an expensive steak"
print s
s=s.replace("steak","temp")
s=s.replace("ghost","steak")
S=s.replace("temp","steak")
print s
使用count变量在string.replace()
方法。 因此,使用你的代码,你wouold有:
s="The scary ghost ordered an expensive steak"
print s
s=s.replace("steak","ghost", 1)
s=s.replace("ghost","steak", 1)
print s
http://docs.python.org/2/library/stdtypes.html
怎么样这样的事情? 存放在拆分列表中的原话有一个翻译字典。 保持你的核心代码短,则只需调整字典,当你需要调整转换。 另外,容易移植到一个函数:
def translate_line(s, translation_dict):
line = []
for i in s.split():
# To take account for punctuation, strip all non-alnum from the
# word before looking up the translation.
i = ''.join(ch for ch in i if ch.isalnum()]
line.append(translation_dict.get(i, i))
return ' '.join(line)
>>> translate_line("The scary ghost ordered an expensive steak", {'steak': 'ghost', 'ghost': 'steak'})
'The scary steak ordered an expensive ghost'
注意:考虑到这一问题的收视率,我未删除和重写它为不同类型的测试用例
我从答案认为有四层竞争的实现
>>> def sub_noregex(hay):
"""
The Join and replace routine which outpeforms the regex implementation. This
version uses generator expression
"""
return 'steak'.join(e.replace('steak','ghost') for e in hay.split('ghost'))
>>> def sub_regex(hay):
"""
This is a straight forward regex implementation as suggested by @mgilson
Note, so that the overheads doesn't add to the cummulative sum, I have placed
the regex creation routine outside the function
"""
return re.sub(regex,lambda m:sub_dict[m.group()],hay)
>>> def sub_temp(hay, _uuid = str(uuid4())):
"""
Similar to Mark Tolonen's implementation but rather used uuid for the temporary string
value to reduce collission
"""
hay = hay.replace("steak",_uuid).replace("ghost","steak").replace(_uuid,"steak")
return hay
>>> def sub_noregex_LC(hay):
"""
The Join and replace routine which outpeforms the regex implementation. This
version uses List Comprehension
"""
return 'steak'.join([e.replace('steak','ghost') for e in hay.split('ghost')])
广义timeit功能
>>> def compare(n, hay):
foo = {"sub_regex": "re",
"sub_noregex":"",
"sub_noregex_LC":"",
"sub_temp":"",
}
stmt = "{}(hay)"
setup = "from __main__ import hay,"
for k, v in foo.items():
t = Timer(stmt = stmt.format(k), setup = setup+ ','.join([k, v] if v else [k]))
yield t.timeit(n)
而广义的测试程序
>>> def test(*args, **kwargs):
n = kwargs['repeat']
print "{:50}{:^15}{:^15}{:^15}{:^15}".format("Test Case", "sub_temp",
"sub_noregex ", "sub_regex",
"sub_noregex_LC ")
for hay in args:
hay, hay_str = hay
print "{:50}{:15.10}{:15.10}{:15.10}{:15.10}".format(hay_str, *compare(n, hay))
并且测试结果如下
>>> test((' '.join(['steak', 'ghost']*1000), "Multiple repeatation of search key"),
('garbage '*998 + 'steak ghost', "Single repeatation of search key at the end"),
('steak ' + 'garbage '*998 + 'ghost', "Single repeatation of at either end"),
("The scary ghost ordered an expensive steak", "Single repeatation for smaller string"),
repeat = 100000)
Test Case sub_temp sub_noregex sub_regex sub_noregex_LC
Multiple repeatation of search key 0.2022748797 0.3517142003 0.4518992298 0.1812594258
Single repeatation of search key at the end 0.2026047957 0.3508259952 0.4399926194 0.1915298898
Single repeatation of at either end 0.1877455356 0.3561734007 0.4228843986 0.2164233388
Single repeatation for smaller string 0.2061019057 0.3145984487 0.4252060592 0.1989413449
>>>
根据测试结果
非正则表达式LC和临时变量替换有更好的表现,虽然临时变量的使用性能是不相符
相比于发电机LC版本具有更好的性能(已确认)
正则表达式是较慢的两倍以上(因此,如果一段代码是一个瓶颈,则执行改变可以被重新考虑)
正则表达式和非正则表达式版本是等价的强大和可扩展