拉姆达与列表理解性能(lambda versus list comprehension perfor

2019-09-01 16:43发布

站内文章 / 前端开发

33 0

祖国的老花朵

女 | 书童

私信

我最近发布使用lambda函数，并在答复有人提到拉姆达将会失宠，使用列表内涵，而不是一个问题。我是比较新到Python。我跑了一个简单的测试：

import time

S=[x for x in range(1000000)]
T=[y**2 for y in range(300)]
#
#
time1 = time.time()
N=[x for x in S for y in T if x==y]
time2 = time.time()
print 'time diff [x for x in S for y in T if x==y]=', time2-time1
#print N
#
#
time1 = time.time()
N=filter(lambda x:x in S,T)
time2 = time.time()
print 'time diff filter(lambda x:x in S,T)=', time2-time1
#print N
#
#
#http://snipt.net/voyeg3r/python-intersect-lists/
time1 = time.time()
N = [val for val in S if val in T]
time2 = time.time()
print 'time diff [val for val in S if val in T]=', time2-time1
#print N
#
#
time1 = time.time()
N= list(set(S) & set(T))
time2 = time.time()
print 'time diff list(set(S) & set(T))=', time2-time1
#print N  #the results will be unordered as compared to the other ways!!!
#
#
time1 = time.time()
N=[]
for x in S:
    for y in T:
        if x==y:
            N.append(x)
time2 = time.time()
print 'time diff using traditional for loop', time2-time1
#print N

它们都打印相同的N个，所以我评论说打印出来stmt是（除了它是无序的最后一种方法），但由此产生的时间差均超过所看到在这一个实例反复试验有趣：

time diff [x for x in S for y in T if x==y]= 54.875
time diff filter(lambda x:x in S,T)= 0.391000032425
time diff [val for val in S if val in T]= 12.6089999676
time diff list(set(S) & set(T))= 0.125
time diff using traditional for loop 54.7970001698

因此，虽然我觉得整体上更易于阅读列表理解，似乎是在这个例子中，至少一些性能问题。

于是，两个问题：

为什么拉姆达等被推开？
对于列表理解的方式，有没有更有效的实现，你怎么知道它的效率更高，而不测试？我的意思是，拉姆达/地图/过滤器应该是因为额外的函数调用的效率较低，但它似乎是更有效的。

保罗

Answer 1:

你的测试都做得非常不同的事情。与S为1M元件和T是300：

[x for x in S for y in T if x==y]= 54.875

此选项不300M相等比较。

filter(lambda x:x in S,T)= 0.391000032425

此选项会300个线性搜索通过S.

[val for val in S if val in T]= 12.6089999676

此选项通过T.确实1M线性搜索

list(set(S) & set(T))= 0.125

这个选项做了两层集的结构和一个交集。

在这些选项之间的性能差异是更相关的每个人使用，而不是列表解析之间有什么区别的算法lambda 。

Answer 2:

当我解决您的代码，以便在列表理解和调用filter实际上做同样的工作的事情改变了一大堆：

import time

S=[x for x in range(1000000)]
T=[y**2 for y in range(300)]
#
#
time1 = time.time()
N=[x for x in T if x in S]
time2 = time.time()
print 'time diff [x for x in T if x in S]=', time2-time1
#print N
#
#
time1 = time.time()
N=filter(lambda x:x in S,T)
time2 = time.time()
print 'time diff filter(lambda x:x in S,T)=', time2-time1
#print N

然后输出更像是：

time diff [x for x in T if x in S]= 0.414485931396
time diff filter(lambda x:x in S,T)= 0.466315984726

所以列表理解有一个时间的通常相当接近，通常比lambda表达式更少。

究其原因lambda表达式被淘汰的是，很多人认为他们有很多比列表内涵的可读性。我有点不情愿地同意。

Answer 3:

Q: Why is lambda etc being pushed aside?

A: List comprehensions and generator expressions are generally considered to be a nice mix of power and readability. The pure functional-programming style where you use map(), reduce(), and filter() with functions (often lambda functions) is considered not as clear. Also, Python has added built-in functions that nicely handle all the major uses for reduce().

Suppose you wanted to sum a list. Here are two ways of doing it.

lst = range(10)
print reduce(lambda x, y: x + y, lst)

print sum(lst)

Sign me up as a fan of sum() and not a fan of reduce() to solve this problem. Here's another, similar problem:

lst = range(10)
print reduce(lambda x, y: bool(x or y), lst)

print any(lst)

Not only is the any() solution easier to understand, but it's also much faster; it has short-circuit evaluation, such that it will stop evaluating as soon as it has found any true value. The reduce() has to crank through the entire list. This performance difference would be stark if the list was a million items long, and the first item evaluated true. By the way, any() was added in Python 2.5; if you don't have it, here is a version for older versions of Python:

def any(iterable):
    for x in iterable:
        if x:
            return True
    return False

Suppose you wanted to make a list of squares of even numbers from some list.

lst = range(10)
print map(lambda x: x**2, filter(lambda x: x % 2 == 0, lst))

print [x**2 for x in lst if x % 2 == 0]

Now suppose you wanted to sum that list of squares.

lst = range(10)
print sum(map(lambda x: x**2, filter(lambda x: x % 2 == 0, lst)))

# list comprehension version of the above
print sum([x**2 for x in lst if x % 2 == 0])

# generator expression version; note the lack of '[' and ']'
print sum(x**2 for x in lst if x % 2 == 0)

The generator expression actually just returns an iterable object. sum() takes the iterable and pulls values from it, one by one, summing as it goes, until all the values are consumed. This is the most efficient way you can solve this problem in Python. In contrast, the map() solution, and the equivalent solution with a list comprehension inside the call to sum(), must first build a list; this list is then passed to sum(), used once, and discarded. The time to build the list and then delete it again is just wasted. (EDIT: and note that the version with both map and filter must build two lists, one built by filter and one built by map; both lists are discarded.) (EDIT: But in Python 3.0 and newer, map() and filter() are now both "lazy" and produce an iterator instead of a list; so this point is less true than it used to be. Also, in Python 2.x you were able to use itertools.imap() and itertools.ifilter() for iterator-based map and filter. But I continue to prefer the generator expression solutions over any map/filter solutions.)

By composing map(), filter(), and reduce() in combination with lambda functions, you can do many powerful things. But Python has idiomatic ways to solve the same problems which are simultaneously better performing and easier to read and understand.

Answer 4:

很多人已经指出，您比较苹果与橘子等，等，但我认为没有人展示了如何一个非常简单的比较 - 列表理解VS地图加拉姆达很少在其他的方式来获得 - 这可能：

$ python -mtimeit -s'L=range(1000)' 'map(lambda x: x+1, L)'
1000 loops, best of 3: 328 usec per loop
$ python -mtimeit -s'L=range(1000)' '[x+1 for x in L]'
10000 loops, best of 3: 129 usec per loop

在这里，你可以看到非常明显的λ的成本 - 约200微秒，这在十分简单的操作的情况下，像这样的一个沼泽操作本身。

数字当然是有过滤器非常相似，因为这个问题是不是过滤或地图，而是拉姆达本身：

$ python -mtimeit -s'L=range(1000)' '[x for x in L if not x%7]'
10000 loops, best of 3: 162 usec per loop
$ python -mtimeit -s'L=range(1000)' 'filter(lambda x: not x%7, L)'
1000 loops, best of 3: 334 usec per loop

毫无疑问的事实，拉姆达可能不太清楚，或与斯巴达其怪异的连接（斯巴达人拉姆达为“Lakedaimon”，涂在他们的盾牌 - 这表明拉姆达是相当独裁和血腥;-)至少有作为多与它慢慢开始放弃时尚的，因为它的性能开销做。但后者是比较真实的。

Answer 5:

首先，测试是这样的：

import timeit

S=[x for x in range(10000)]
T=[y**2 for y in range(30)]

print "v1", timeit.Timer('[x for x in S for y in T if x==y]',
             'from __main__ import S,T').timeit(100)
print "v2", timeit.Timer('filter(lambda x:x in S,T)',
             'from __main__ import S,T').timeit(100)
print "v3", timeit.Timer('[val for val in T if val in S]',
             'from __main__ import S,T').timeit(100)
print "v4", timeit.Timer('list(set(S) & set(T))',
             'from __main__ import S,T').timeit(100)

基本上你在每次测试时间做不同的事情。当你将重写列表修真例如作为

[val for val in T if val in S]

性能将与看齐的的lambda /过滤器“结构。

Answer 6:

集是这个正确的解决方案。然而试换S和T，看看需要多长时间！

filter(lambda x:x in T,S)

$ python -m timeit -s'S=[x for x in range(1000000)];T=[y**2 for y in range(300)]' 'filter(lambda x:x in S,T)'
10 loops, best of 3: 485 msec per loop
$ python -m timeit -r1 -n1 -s'S=[x for x in range(1000000)];T=[y**2 for y in range(300)]' 'filter(lambda x:x in T,S)'
1 loops, best of 1: 19.6 sec per loop

所以你看到，S和T的顺序是非常重要的

更改列表理解的为了匹配滤波器使

$ python -m timeit  -s'S=[x for x in range(1000000)];T=[y**2 for y in range(300)]' '[x for x in T if x in S]'
10 loops, best of 3: 441 msec per loop

所以，如果事实列表理解比我的计算机上的拉姆达稍快

Answer 7:

您的列表理解和λ都在做不同的事情，列表解析匹配拉姆达会[val for val in T if val in S]

效率不为什么列表理解是优选的（虽然他们实际上是稍微在几乎所有情况下更快）的原因。为什么他们首选的原因是可读性。

更小的循环体和大循环试试吧，喜欢做T A组，和遍历S.在我的机器上的情况下，列表内涵几乎快一倍。

Answer 8:

您的分析是做错了。我们来看一看timeit模块，然后再试一次。

lambda定义匿名函数。他们的主要问题是，很多人不知道整个Python库，并利用它们来重新实现已经在功能operator ， functools等模块（和更快）。

列表解析都无关lambda 。它们等同于标准的filter和map函数式语言功能。信用证是首选，因为它们可以也用作发电机，何况可读性。

Answer 9:

这是相当快：

def binary_search(a, x, lo=0, hi=None):
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        midval = a[mid]
        if midval < x:
            lo = mid+1
        elif midval > x: 
            hi = mid
        else:
            return mid
    return -1

time1 = time.time()
N = [x for x in T if binary_search(S, x) >= 0]
time2 = time.time()
print 'time diff binary search=', time2-time1

简单地说：少comparisions，更短的时间。

Answer 10:

列表理解可以做出更大的差异，如果你有处理您的筛选结果。你的情况，你只要建立一个列表，但如果你不得不做这样的事情：

n = [f(i) for i in S if some_condition(i)]

您将获得来自该LC优化：

n = map(f, filter(some_condition(i), S))

仅仅是因为后者具有建立一个中间列表（或元组，或串，这取决于S的性质）。因此，你还会发现在每种方法所使用的内存有不同的影响，液晶将不断降低。

本身拉姆达无所谓。

文章来源: lambda versus list comprehension performance

标签： python lambda set list-comprehension

祖国的老花朵

女 | 书童

私信

收藏的人(0)

Ta的文章更多文章

0条评论

还没有人评论过~