I happened to find myself having a basic filtering need: I have a list and I have to filter it by an attribute of the items.
My code looked like this:
my_list = [x for x in my_list if x.attribute == value]
But then I thought, wouldn't it be better to write it like this?
my_list = filter(lambda x: x.attribute == value, my_list)
It's more readable, and if needed for performance the lambda could be taken out to gain something.
Question is: are there any caveats in using the second way? Any performance difference? Am I missing the Pythonic Way™ entirely and should do it in yet another way (such as using itemgetter instead of the lambda)?
Although
filter
may be the "faster way", the "Pythonic way" would be not to care about such things unless performance is absolutely critical (in which case you wouldn't be using Python!).Since any speed difference is bound to be miniscule, whether to use filters or list comprehensions comes down to a matter of taste. In general I'm inclined to use comprehensions (which seems to agree with most other answers here), but there is one case where I prefer
filter
.A very frequent use case is pulling out the values of some iterable X subject to a predicate P(x):
but sometimes you want to apply some function to the values first:
As a specific example, consider
I think this looks slightly better than using
filter
. But now considerIn this case we want to
filter
against the post-computed value. Besides the issue of computing the cube twice (imagine a more expensive calculation), there is the issue of writing the expression twice, violating the DRY aesthetic. In this case I'd be apt to useIt is strange how much beauty varies for different people. I find the list comprehension much clearer than
filter
+lambda
, but use whichever you find easier. However, do stop giving your variables names already used for built-ins, that's confusing.There are two things that may slow down your use of
filter
.The first is the function call overhead: as soon as you use a Python function (whether created by
def
orlambda
) it is likely that filter will be slower than the list comprehension. It almost certainly is not enough to matter, and you shouldn't think much about performance until you've timed your code and found it to be a bottleneck, but the difference will be there.The other overhead that might apply is that the lambda is being forced to access a scoped variable (
value
). That is slower than accessing a local variable and in Python 2.x the list comprehension only accesses local variables. If you are using Python 3.x the list comprehension runs in a separate function so it will also be accessingvalue
through a closure and this difference won't apply.The other option to consider is to use a generator instead of a list comprehension:
Then in your main code (which is where readability really matters) you've replaced both list comprehension and filter with a hopefully meaningful function name.
My take
Filter is just that. It filters out the elements of a list. You can see the definition mentions the same(in the official docs link I mentioned before). Whereas, list comprehension is something that produces a new list after acting upon something on the previous list.(Both filter and list comprehension creates new list and not perform operation in place of the older list. A new list here is something like a list with, say, an entirely new data type. Like converting integers to string ,etc)
In your example, it is better to use filter than list comprehension, as per the definition. However, if you want, say other_attribute from the list elements, in your example is to be retrieved as a new list, then you can use list comprehension.
This is how I actually remember about filter and list comprehension. Remove a few things within a list and keep the other elements intact, use filter. Use some logic on your own at the elements and create a watered down list suitable for some purpose, use list comprehension.
I thought I'd just add that in python 3, filter() is actually an iterator object, so you'd have to pass your filter method call to list() in order to build the filtered list. So in python 2:
lists b and c have the same values, and were completed in about the same time as filter() was equivalent [x for x in y if z]. However, in 3, this same code would leave list c containing a filter object, not a filtered list. To produce the same values in 3:
The problem is that list() takes an iterable as it's argument, and creates a new list from that argument. The result is that using filter in this way in python 3 takes up to twice as long as the [x for x in y if z] method because you have to iterate over the output from filter() as well as the original list.