I need help with a program I'm making in Python.
Assume I wanted to replace every instance of the word "steak"
to "ghost"
(just go with it...) but I also wanted to replace every instance of the word "ghost"
to "steak"
at the same time. The following code does not work:
s="The scary ghost ordered an expensive steak"
print s
s=s.replace("steak","ghost")
s=s.replace("ghost","steak")
print s
it prints: The scary steak ordered an expensive steak
What I'm trying to get is The scary steak ordered an expensive ghost
Rename one of the words to a temp value that doesn't occur in the text. Note this wouldn't be the most efficient way for a very large text. For that a
re.sub
might be more appropriate.I'd probably use a regex here:
Or, as a function which you can copy/paste:
Basically, I create a mapping of words that I want to replace with other words (
sub_dict
). I can create a regular expression from that mapping. In this case, the regular expression is"steak|ghost"
(or"ghost|steak"
-- order doesn't matter) and the regex engine does the rest of the work of finding non-overlapping sequences and replacing them accordingly.Some possibly useful modifications
regex = '|'.join(map(re.escape,replace_dict))
-- Allows the regular expressions to have special regular expression syntax in them (like parenthesis). This escapes the special characters to make the regular expressions match the literal text.regex = '|'.join(r'\b{0}\b'.format(x) for x in replace_dict)
-- make sure that we don't match if one of our words is a substring in another word. In other words, changehe
toshe
but notthe
totshe
.Note Considering the viewership of this Question, I undeleted and rewrote it for different types of test cases
I have considered four competing implementations from the answers
A generalized timeit function
And the generalized test routine
And the Test Results are as follows
Based on the Test Result
Non Regex LC and the temp variable substitution have better performance though the performance of the usage of temp variable is not consistent
LC version has better performance compared to generator (confirmed)
Regex is more than two times slower (so if the piece of code is a bottleneck then the implementation change can be reconsidered)
The Regex and non regex versions are equivalently Robust and can scale
How about something like this? Store the original in a split list, then have a translation dict. Keeps your core code short, then just adjust the dict when you need to adjust the translation. Plus, easy to port to a function:
Split the string by one of the targets, do the replace, and put the whole thing back together.
This works exactly as
.replace()
would, including ignoring word boundaries. So it will turn"steak ghosts"
into"ghost steaks"
.Use the count variable in the
string.replace()
method. So using your code, you wouold have:http://docs.python.org/2/library/stdtypes.html