Speed differences between intersection() and '

Which one of these is faster? Is one "better"? Basically I'll have two sets and I want to eventually get one match from between the two lists. So really I suppose the for loop is more like:

for object in set:
    if object in other_set:
        return object

Like I said - I only need one match, but I'm not sure how intersection() is handled, so I don't know if its any better. Also, if it helps, the other_set is a list near 100,000 components and the set is maybe a few hundred, max few thousand.

标签： python performance data-structures set intersection

3条回答

兄弟一词,经得起流年.

2楼-- · 2019-03-20 16:01

I wrote a simple utility that checks if two sets have at least one element in common. I had the same optimization problem today and your post saved my day. This is just a way to thank you for pointing this out, hope this will help other people too :)

Notice. The utility does NOT return the first element in common but rather returns true if they have at least one element in common, false otherwise. Of course it can be easily hacked to meet your goal.

def nonEmptyIntersection(A, B):
    """
    Returns true if set A intersects set B.
    """
    smaller, bigger = A, B
    if len(B) < len(A):
        smaller, bigger = bigger, smaller
    for e in smaller:
        if e in bigger:
            return True
    return False

0人赞添加讨论(0) 举报

ら.Afraid

3楼-- · 2019-03-20 16:16

Your code is fine. Item lookup if object in other_set for sets is quite efficient.

0人赞添加讨论(0) 举报

Evening l夕情丶

4楼-- · 2019-03-20 16:18

from timeit import timeit

setup = """
from random import sample, shuffle
a = range(100000)
b = sample(a, 1000)
a.reverse()
"""

forin = setup + """
def forin():
    # a = set(a)
    for obj in b:
        if obj in a:
            return obj
"""

setin = setup + """
def setin():
    # original method:
    # return tuple(set(a) & set(b))[0]
    # suggested in comment, doesn't change conclusion:
    return next(iter(set(a) & set(b)))
"""

print timeit("forin()", forin, number = 100)
print timeit("setin()", setin, number = 100)

Times:

>>>
0.0929054012768
0.637904308732
>>>
0.160845057616
1.08630760484
>>>
0.322059185123
1.10931801261
>>>
0.0758695262169
1.08920981403
>>>
0.247866360526
1.07724461708
>>>
0.301856152688
1.07903130641

Making them into sets in the setup and running 10000 runs instead of 100 yields

>>>
0.000413064976328
0.152831597075
>>>
0.00402408388788
1.49093627898
>>>
0.00394538156695
1.51841512101
>>>
0.00397715579584
1.52581949403
>>>
0.00421472926155
1.53156769646

So your version is much faster whether or not it makes sense to convert them to sets.

0人赞添加讨论(0) 举报

Speed differences between intersection() and '

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间