可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I would like to intersect two lists in Python (2.7). I need the result to be iterable:
list1 = [1,2,3,4]
list2 = [3,4,5,6]
result = (3,4) # any kind of iterable
Providing a full iteration will be performed first thing after the intersection, which of the following is more efficient?
Using a generator:
result = (x for x in list1 if x in list2)
Using filter():
result = filter(lambda x: x in list2, list1)
Other suggestions?
Thanks in advance,
Amnon
回答1:
Neither of these. The best way is to use sets.
list1 = [1,2,3,4]
list2 = [3,4,5,6]
result = set(list1).intersection(list2)
Sets are iterable, so no need to convert the result into anything.
回答2:
Your solution has a complexity of O(m*n)
, where m
and n
are the respective lengths of the two lists. You can improve the complexity to O(m+n)
using a set for one of the lists:
s = set(list1)
result = [x for x in list2 if x in s]
In cases where speed matters more than readability (that is, almost never), you can also use
result = filter(set(a).__contains__, b)
which is about 20 percent faster than the other solutions on my machine.
回答3:
for the case of lists, the most efficient way is to use:
result = set(list1).intersection(list2)
as mentioned, but for numpy arrays, intersection1d
function is more efficient:
import numpy as np
result = np.intersection1d(list1, list2)
Especially, when you know that the lists don't have duplicate values, you can use it as:
result = np.intersection1d(list1, list2, assume_unique=True)
回答4:
I tried to compare the speed of 3 methods of list intersection:
import random
a = [random.randint(0, 1000) for _ in range(1000)]
b = [random.randint(0, 1000) for _ in range(1000)]
Solution 1: list comprehension
Time elapse: 8.95265507698059
import time
start = time.time()
for _ in range(1000):
result = [x for x in a if x in b]
elapse = time.time() - start
print(elapse)
Solution 2: set
Time elapse: 0.09089064598083496
start = time.time()
for _ in range(1000):
result = set.intersection(set(a), set(b))
elapse = time.time() - start
print(elapse)
Solution 3: numpy.intersect1d
Time elapse: 0.323300838470459
start = time.time()
for _ in range(1000):
result = np.intersect1d(a, b)
elapse = time.time() - start
print(elapse)
Conclusion
I think use set.intersection
is the fastest way.