I have a dictionary with the strings to be replaced as keys
and its replacement as values. Other than looking through the strings token by token, is there a better/faster way of doing the replacement?
I've been doing it as such:
segmenter = {'foobar':'foo bar', 'withoutspace':'without space', 'barbar': 'bar bar'}
sentence = "this is a foobar in a barbar withoutspace"
for i in sentence.split():
if i in segmenter:
sentence.replace(i, segmenter[i])
String are immutable in python. So, str.replace
returns a new string instead of modifying the original string. You can use str.join()
and list comprehension here:
>>> segmenter = {'foobar':'foo bar', 'withoutspace':'without space', 'barbar': 'bar bar'}
>>> sentence = "this is a foobar in a barbar withoutspace"
>>> " ".join( [ segmenter.get(word,word) for word in sentence.split()] )
'this is a foo bar in a bar bar without space'
Another problem with str.replace
is that it'll also replace words like "abarbarb"
with
"abar barb"
.
re.sub
can call a function that returns the substitution
segmenter = {'foobar':'foo bar', 'withoutspace':'without space', 'barbar': 'bar bar'}
sentence = "this is a foobar in a barbar withoutspace"
import re
def fn(match):
return segmenter[match.group()]
print re.sub('|'.join(re.escape(k) for k in segmenter), fn, sentence)