Imagine a string, like 'Agh#$%#%2341- -!zdrkfd' and I only wish to perform some operating on it such that only the lowercase letters are returned (as an example), which in this case would bring 'ghzdrkfd'.
How do you do this in Python? The obvious way would be to create a list, of characters, 'a' through 'z', then iterate over the characters in my string and build a new string, character by character, of those in my list only. This seems primitive.
I was wondering if regular expressions are appropriate. Replacing unwanted characters seems problematic and I tend to prefer whitelisting over blacklisting. The .match
function does not seem appropriate. I have looked over the appropriate page on the Python site, but have not found a method which seems to fit.
If regular expressions are not appropriate and the correct approach is looping, is there a simple function which "explodes" a string into a list? Or am I just hitting another for loop there?
If you are looking for efficiency. Using the translate function is the fastest you can get.
It can be used to quickly replace characters and/or delete them.
In python 2.6: you don't need the second table anymore
This is method is way faster than any other. Of course you need to store the delete_table somewhere and use it. But even if you don't store it and build it every time, it is still going to be faster than other suggested methods so far.
To confirm my claims here are the results:
While running the regular expression solution:
[Upon request] If you pre-compile the regular expression:
Running the translate method the same number of times took:
Here's one solution if you are specifically interested in working on strings:
The whitelist is actually a set (not a list) for efficiency.
If you need a string, use join():
filter() is a more generic solution. From the documentation (http://docs.python.org/library/functions.html):
This would be one way of using filter():
String objects are iterable; there is no need to "explode" the string into a list. You can put whatever condition you want in the list comprehension, and it will filter characters accordingly.
You could also implement this using a regex, but this will only hide the loop. The regular expressions library will still have to loop through the characters of the string in order to filter them.
A more generic and understandable solution to take an
inputstring
and filter it against awhitelist
of characters :This prints
The first
string.translate
removes all characters in the whitelist from the inputstring. This gives us the characters we want to remove. The secondstring.translate
call removes those from the inputstring and produces the desired result.