I recently posted a question using a lambda function and in a reply someone had mentioned lambda is going out of favor, to use list comprehensions instead. I am relatively new to Python. I ran a simple test:
import time
S=[x for x in range(1000000)]
T=[y**2 for y in range(300)]
#
#
time1 = time.time()
N=[x for x in S for y in T if x==y]
time2 = time.time()
print 'time diff [x for x in S for y in T if x==y]=', time2-time1
#print N
#
#
time1 = time.time()
N=filter(lambda x:x in S,T)
time2 = time.time()
print 'time diff filter(lambda x:x in S,T)=', time2-time1
#print N
#
#
#http://snipt.net/voyeg3r/python-intersect-lists/
time1 = time.time()
N = [val for val in S if val in T]
time2 = time.time()
print 'time diff [val for val in S if val in T]=', time2-time1
#print N
#
#
time1 = time.time()
N= list(set(S) & set(T))
time2 = time.time()
print 'time diff list(set(S) & set(T))=', time2-time1
#print N #the results will be unordered as compared to the other ways!!!
#
#
time1 = time.time()
N=[]
for x in S:
for y in T:
if x==y:
N.append(x)
time2 = time.time()
print 'time diff using traditional for loop', time2-time1
#print N
They all print the same N so I commented that print stmt out (except the last way it's unordered), but the resulting time differences were interesting over repeated tests as seen in this one example:
time diff [x for x in S for y in T if x==y]= 54.875
time diff filter(lambda x:x in S,T)= 0.391000032425
time diff [val for val in S if val in T]= 12.6089999676
time diff list(set(S) & set(T))= 0.125
time diff using traditional for loop 54.7970001698
So while I find list comprehensions on the whole easier to read, there seems to be some performance issues at least in this example.
So, two questions:
Why is lambda etc being pushed aside?
For the list comprehension ways, is there a more efficient implementation and how would you KNOW it's more efficient without testing? I mean, lambda/map/filter was supposed to be less efficient because of the extra function calls, but it seems to be MORE efficient.
Paul
List comprehensions can make a bigger difference if you have to process your filtered results. In your case you just build a list, but if you had to do something like this:
you would gain from LC optimization over this:
simply because the latter has to build an intermediate list (or tuple, or string, depending on the nature of S). As a consequence you will also notice a different impact on the memory used by each method, the LC will keep lower.
The lambda in itself does not matter.
Your list comprehension and lambda are doing different things, the list comprehension matching the lambda would be
[val for val in T if val in S]
.Efficiency isn't the reason why list comprehension are preferred (while they actually are slightly faster in almost all cases). The reason why they are preferred is readability.
Try it with smaller loop body and larger loops, like make T a set, and iterate over S. In that case on my machine the list comprehension is nearly twice as fast.
Q: Why is lambda etc being pushed aside?
A: List comprehensions and generator expressions are generally considered to be a nice mix of power and readability. The pure functional-programming style where you use
map()
,reduce()
, andfilter()
with functions (oftenlambda
functions) is considered not as clear. Also, Python has added built-in functions that nicely handle all the major uses forreduce()
.Suppose you wanted to sum a list. Here are two ways of doing it.
Sign me up as a fan of
sum()
and not a fan ofreduce()
to solve this problem. Here's another, similar problem:Not only is the
any()
solution easier to understand, but it's also much faster; it has short-circuit evaluation, such that it will stop evaluating as soon as it has found any true value. Thereduce()
has to crank through the entire list. This performance difference would be stark if the list was a million items long, and the first item evaluated true. By the way,any()
was added in Python 2.5; if you don't have it, here is a version for older versions of Python:Suppose you wanted to make a list of squares of even numbers from some list.
Now suppose you wanted to sum that list of squares.
The generator expression actually just returns an iterable object.
sum()
takes the iterable and pulls values from it, one by one, summing as it goes, until all the values are consumed. This is the most efficient way you can solve this problem in Python. In contrast, themap()
solution, and the equivalent solution with a list comprehension inside the call tosum()
, must first build a list; this list is then passed tosum()
, used once, and discarded. The time to build the list and then delete it again is just wasted. (EDIT: and note that the version with bothmap
andfilter
must build two lists, one built byfilter
and one built bymap
; both lists are discarded.) (EDIT: But in Python 3.0 and newer, map() and filter() are now both "lazy" and produce an iterator instead of a list; so this point is less true than it used to be. Also, in Python 2.x you were able to use itertools.imap() and itertools.ifilter() for iterator-based map and filter. But I continue to prefer the generator expression solutions over any map/filter solutions.)By composing
map()
,filter()
, andreduce()
in combination withlambda
functions, you can do many powerful things. But Python has idiomatic ways to solve the same problems which are simultaneously better performing and easier to read and understand.Your tests are doing very different things. With S being 1M elements and T being 300:
This option does 300M equality comparisons.
This option does 300 linear searches through S.
This option does 1M linear searches through T.
This option does two set constructions and one set intersection.
The differences in performance between these options is much more related to the algorithms each one is using, rather than any difference between list comprehensions and
lambda
.