I want to remove characters in a string in python:
string.replace(',', '').replace("!", '').replace(":", '').replace(";", '')...
But I have many characters I have to remove. I thought about a list
list = [',', '!', '.', ';'...]
But how can I use the list
to replace the characters in the string
?
If you're using python2 and your inputs are strings (not unicodes), the absolutely best method is
str.translate
:Otherwise, there are following options to consider:
A. Iterate the subject char by char, omit unwanted characters and
join
the resulting list:(Note that the generator version
''.join(c for c ...)
will be less efficient).B. Create a regular expression on the fly and
re.sub
with an empty string:(
re.escape
ensures that characters like^
or]
won't break the regular expression).C. Use the mapping variant of
translate
:Full testing code and timings:
Results:
(As a side note, the figure for
remove_chars_translate_bytes
might give us a clue why the industry was reluctant to adopt Unicode for such a long time).Also an interesting topic on removal UTF-8 accent form a string converting char to their standard non-accentuated char:
What is the best way to remove accents in a python unicode string?
code extract from the topic:
How about this - a one liner.
Another approach using regex:
Why not a simple loop?
Also, avoid naming lists 'list'. It overrides the built-in function
list
.You can use
str.translate()
:Example: