Profiled performance of len(set) vs. set.__len__()

This question already has an answer here:

Why is Python's 'len' function faster than the __len__ method? 2 answers

While profiling my Python's application, I've discovered that len() seems to be a very expensive one when using sets. See the below code:

import cProfile

def lenA(s):
    for i in range(1000000):
        len(s);

def lenB(s):
    for i in range(1000000):
        s.__len__();

def main():
    s = set();
    lenA(s);
    lenB(s);

if __name__ == "__main__":
    cProfile.run("main()","stats");

According to profiler's stats below, lenA() seems to be 14 times slower than lenB():

 ncalls  tottime  percall  cumtime  percall  filename:lineno(function)
      1    1.986    1.986    3.830    3.830  .../lentest.py:5(lenA)
1000000    1.845    0.000    1.845    0.000  {built-in method len}
      1    0.273    0.273    0.273    0.273  .../lentest.py:9(lenB)

Am I missing something? Currently I use __len__() instead of len(), but the code looks dirty :(

标签： python performance profiling set

3条回答

\"骚年 ilove

2楼-- · 2019-02-12 23:19

This was going to be a comment but after larsman's comment on his controversial results and the result I got, I think it is interesting to add my data to the thread.

Trying more or less the same setup I got the contrary the OP got, and in the same direction commented by larsman:

12.1964105975   <- __len__
6.22144670823   <- len()

C:\Python26\programas>

The test:

def lenA(s):
    for i in range(100):
        len(s);

def lenB(s):
    for i in range(100):
        s.__len__();

s = set()

if __name__ == "__main__":

    from timeit import timeit
    print timeit("lenB(s)", setup="from __main__ import lenB, s")
    print timeit("lenA(s)", setup="from __main__ import lenA, s")

This is activepython 2.6.7 64bit in win7

0人赞添加讨论(0) 举报

你好瞎i

3楼-- · 2019-02-12 23:21

This is an interesting observation about the profiler, which has nothing to do with the actual performance of the len function. You see, in the profiler stats, there are two lines concerning lenA:

 ncalls  tottime  percall  cumtime  percall  filename:lineno(function)
      1    1.986    1.986    3.830    3.830  .../lentest.py:5(lenA)
1000000    1.845    0.000    1.845    0.000  {built-in method len}

...while there is only one line concerning lenB:

      1    0.273    0.273    0.273    0.273  .../lentest.py:9(lenB)

The profiler has timed each single call from lenA to len, but timed lenB as a whole. Timing a call always adds some overhead; in the case of lenA you see this overhead multiplied a million times.

0人赞添加讨论(0) 举报

你好瞎i

4楼-- · 2019-02-12 23:32

Obviously, len has some overhead, since it does a function call and translates AttributeError to TypeError. Also, set.__len__ is such a simple operation that it's bound to be very fast in comparison to just about anything, but I still don't find anything like the 14x difference when using timeit:

In [1]: s = set()

In [2]: %timeit s.__len__()
1000000 loops, best of 3: 197 ns per loop

In [3]: %timeit len(s)
10000000 loops, best of 3: 130 ns per loop

You should always just call len, not __len__. If the call to len is the bottleneck in your program, you should rethink its design, e.g. cache sizes somewhere or calculate them without calling len.

0人赞添加讨论(0) 举报

Profiled performance of len(set) vs. set.__len__()

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间

Profiled performance of len(set) vs. set.len()