Python 3.5 vs. 3.6 what made “map” slower compared

2019-02-03 17:41发布

I sometimes used map if there was a function/method that was written in C to get a bit extra performance. However I recently revisited some of my benchmarks and noticed that the relative performance (compared to a similar list comprehension) drastically changed between Python 3.5 and 3.6.

That's not the actual code but just a minimal sample that illustrates the difference:

import random

lst = [random.randint(0, 10) for _ in range(100000)]
assert list(map((5).__lt__, lst)) == [5 < i for i in lst]
%timeit list(map((5).__lt__, lst))
%timeit [5 < i for i in lst]

I realize that it's not a good idea to use (5).__lt__ but I couldn't come up with a useful example right now.

The timings on Python-3.5 were in favor of the map approach:

15.1 ms ± 5.64 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
16.7 ms ± 35.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

While the Python-3.6 timings actually show that the comprehension is faster:

17.9 ms ± 755 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
14.3 ms ± 128 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

My question is what happened in this case that made the list-comprehension faster and the map solution slower? I realize the difference isn't that much, it just made me curious because that was one of the "tricks" I sometimes (actually seldom) used in performance critical codes.

2条回答
趁早两清
2楼-- · 2019-02-03 17:59

I think a fair comparison involves using the same function and same testing conditions in Python 3.5 and 3.6 as well as when comparing map to list comprehension in a chosen Python version.

In my initial answer I have performed multiple tests that showed that map was still faster by a factor of about two in both versions of Python when compared to list comprehension. However some results were not conclusive and so I performed some more tests.

First let me cite some of your points stated in the question:

"... [I] noticed that the relative performance [of map] (compared to a similar list comprehension) drastically changed between Python 3.5 and 3.6"

You also ask:

"My question is what happened in this case that made the list-comprehension faster and the map solution slower?"

It is not very clear if you mean that map is slower than list comprehension in Python 3.6 or if you mean that map is slower in Python 3.6 than in 3.5 and list comprehension's performance has increased (albeit not necessarily to the level of beating map).

Based on more extensive tests that I have performed after my first answer to this question, I think I have an idea of what is going on.

However, first let's create conditions for "fair" comparisons. For this we need to:

  1. Compare performance of map in different Python versions using the same function;

  2. Compare performance of map to list comprehension in the same version using same function;

  3. Run the tests on same data;

  4. Minimize contribution from timing functions.

Here is version information about my system:

Python 3.5.3 |Continuum Analytics, Inc.| (default, Mar  6 2017, 12:15:08) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
IPython 5.3.0 -- An enhanced Interactive Python.

and

Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:14:59) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.

Let's first address the issue of "same data". Unfortunately because you effectively are using seed(None), each data set lst is different on each of the two versions of Python. This probably contributes to the difference in performance seen on two Python versions. One fix would be to set, e.g., random.seed(0) (or something like that). I chose to create the list once and save it using numpy.save() and then load it in each version. This is especially important because I chose to modify your tests slightly (number of "loops" and "repeats") and I have increased the length of your dataset to 100,000,000:

import numpy as np
import random
lst = [random.randint(0, 10) for _ in range(100000000)]
np.save('lst', lst, allow_pickle=False)

Second, let's use timeit module instead of IPython's magic command %timeit. The reason for doing this comes from the following test performed in Python 3.5:

In [11]: f = (5).__lt__
In [12]: %timeit -n1 -r20 [f(i) for i in lst]
1 loop, best of 20: 9.01 s per loop

Compare this to the result of timeit in same version of Python:

>>> t = timeit.repeat('[f(i) for i in lst]', setup="f = (5).__lt__;
... import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, 
... number=1); print(min(t), max(t), np.mean(t), np.std(t))
7.442819457995938 7.703615028003696 7.5105415405 0.0550515642854

For unknown to me reasons, IPython's magic %timeit is adding some time compared to timeit package. Therefore, I will use timeit exclusively in my testing.

NOTE: In the discussions that follows I will use only minimum timing (min(t)).

Tests in Python 3.5.3:

Group 1: map and list comprehension tests

>>> import numpy as np
>>> import timeit

>>> t = timeit.repeat('list(map(f, lst))', setup="f = (5).__lt__; import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
4.666553302988177 4.811194089008495 4.72791638025 0.041115884397

>>> t = timeit.repeat('[f(i) for i in lst]', setup="f = (5).__lt__; import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
7.442819457995938 7.703615028003696 7.5105415405 0.0550515642854

>>> t = timeit.repeat('[5 < i for i in lst]', setup="import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
4.94656751700677 5.07807950800634 5.00670203845 0.0340474956945

>>> t = timeit.repeat('list(map(abs, lst))', setup="import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
4.167273573024431 4.320013975986512 4.2408865186 0.0378852782878

>>> t = timeit.repeat('[abs(i) for i in lst]', setup="import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
5.664627838006709 5.837686392012984 5.71560354655 0.0456700607748

Notice how second test (list comprehension using f(i)) is significantly slower than third test (list comprehension using 5 < i) indicating that f = (5).__lt__ is not identical (or almost identical) to 5 < i from the code perspective.

Group 2: "individual" function tests

>>> t = timeit.repeat('f(1)', setup="f = (5).__lt__", repeat=20, number=1000000); print(min(t), max(t), np.mean(t), np.std(t))
0.052280781004810706 0.05500587198184803 0.0531139718529 0.000877649561967

>>> t = timeit.repeat('5 < 1', repeat=20, number=1000000); print(min(t), max(t), np.mean(t), np.std(t))
0.030931947025237605 0.033691533986711875 0.0314959864045 0.000633274658428

>>> t = timeit.repeat('abs(1)', repeat=20, number=1000000); print(min(t), max(t), np.mean(t), np.std(t))
0.04685414198320359 0.05405496899038553 0.0483296330043 0.00162837880358

Notice how again first test (of f(1)) is significantly slower than second test (of 5 < 1) further supporting that f = (5).__lt__ is not identical (or almost identical) to 5 < i from the code perspective.

Tests in Python 3.6.2:

Group 1: map and list comprehension tests

>>> import numpy as np
>>> import timeit

>>> t = timeit.repeat('list(map(f, lst))', setup="f = (5).__lt__; import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
4.599696700985078 4.743880658003036 4.6631793691 0.0425774678203

>>> t = timeit.repeat('[f(i) for i in lst]', setup="f = (5).__lt__; import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
7.316072431014618 7.572676292009419 7.3837024617 0.0574811241553

>>> t = timeit.repeat('[5 < i for i in lst]', setup="import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
4.570452399988426 4.679144663008628 4.61264215875 0.0265541828693

>>> t = timeit.repeat('list(map(abs, lst))', setup="import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
2.742673939006636 2.8282236389932223 2.78504617405 0.0260357089928

>>> t = timeit.repeat('[abs(i) for i in lst]', setup="import numpy; lst = numpy.load('lst.npy').tolist()", repeat=20, number=1); print(min(t), max(t), np.mean(t), np.std(t))
6.2177103200228885 6.428813881997485 6.28722427145 0.0493010620999

Group 2: "individual" function tests

>>> t = timeit.repeat('f(1)', setup="f = (5).__lt__", repeat=20, number=1000000); print(min(t), max(t), np.mean(t), np.std(t))
0.051936342992121354 0.05764096099301241 0.0532974587506 0.00117079475737

>>> t = timeit.repeat('5 < 1', repeat=20, number=1000000); print(min(t), max(t), np.mean(t), np.std(t))
0.02675032999832183 0.032919151999522 0.0285137565021 0.00156522182488

>>> t = timeit.repeat('abs(1)', repeat=20, number=1000000); print(min(t), max(t), np.mean(t), np.std(t))
0.047831349016632885 0.0531779529992491 0.0482893927969 0.00112825297875

Notice how again first test (of f(1)) is significantly slower than second test (of 5 < 1) further supporting that f = (5).__lt__ is not identical (or almost identical) to 5 < i from the code perspective.

Discussion

I do not know how reliable are these timing tests and it is also difficult to separate all factors that contribute to these timing results. However we can notice from the "Group 2" of tests that the only "individual" test that significantly changed its timing is the test of 5 < 1: it went down to 0.0268s in Python 3.6 from 0.0309s in Python 3.5. This makes the list comprehension test in Python 3.6 that uses 5 < i to run faster than a similar test in Python 3.5. However, this does not mean that list comprehension become faster in Python 3.6.

Let's compare relative performance of map to list comprehension for the same function in the same Python version. Then we get in Python 3.5: r(f) = 7.4428/4.6666 = 1.595, r(abs) = 5.665/4.167 = 1.359 and in Python 3.6: r(f) = 7.316/4.5997 = 1.591, r(abs) = 6.218/2.743 = 2.267. Based on these relative performances we can see that in Python 3.6 performance of the map relative to the performance of list comprehension is at least the same as in Python 3.5 for the f = (5).__lt__ function and this ratio has even improved for a function such as abs() in Python 3.6.

In any case, I believe that there is no evidence that list comprehension became faster in Python 3.6 neither in the relative nor in the absolute sense. The only performance improvement is for [5 < i for i in lst] test but that is because 5 < i itself became faster in Python 3.6 and not due to the list comprehension itself being faster.

查看更多
爷、活的狠高调
3楼-- · 2019-02-03 18:03

I think a fair comparison would involve using the same function. In the case of your example, when comparison is fair, map still wins:

>>> import sys
>>> print(sys.version)
3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:14:59) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
>>> import random
>>> lst = [random.randint(0, 10) for _ in range(100000)]
>>> assert list(map((5).__lt__, lst)) == [5 < i for i in lst]
>>> f = (5).__lt__
>>> %timeit list(map(f, lst))
4.63 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit [f(i) for i in lst]
9.17 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

While in Python 3.5 (at least on my system) map is faster than in Python 3.6, so is list comprehension:

>>> print(sys.version)
3.5.3 |Continuum Analytics, Inc.| (default, Mar  6 2017, 12:15:08) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
>>> %timeit list(map(f, lst))
100 loops, best of 3: 4.36 ms per loop
>>> %timeit [f(i) for i in lst]
100 loops, best of 3: 8.12 ms per loop

Still, when using the same function, map is ~2x faster than list comprehension in both Python 3.5 and 3.6.

EDIT (Response to @user2357112 comments):

It is my opinion that performing "fair" comparisons is important in answering OP's question: "My question is what happened in this case that made the list-comprehension faster and the map solution slower?" (last paragraph). However, in the first paragraph, @MSeifert says: "... [I] noticed that the relative performance (compared to a similar list comprehension) drastically changed between Python 3.5 and 3.6" That is, the comparison is between a map and list comprehension. Yet, @MSeifert tests are set-up as follows:

timig_map_35 = Timing(list(map(f, lst)))
timing_list_35 = Timing([g(i) for i in lst])

This kind of testing makes it difficult to find the cause of timing differences: are they because list comprehension became faster in 3.6 or map became slower in 3.6 or f(i) is slower in 3.6 or g(i) is faster in 3.6...

Therefore, I proposed to introduce f = (5).__lt__ and use the same function in both map and list comprehension tests. I have also modified @MSeifert test by increasing the number of elements in the list and redusing number of "loops" in the timeit:

import random
lst = [random.randint(0, 10) for _ in range(1000000)] # 10x more elements
f = (5).__lt__
%timeit -n1 -r1000 list(map(f, lst)) # f = (5).__lt__
%timeit -n1 -r1000 [f(i) for i in lst] # f(i) = (5).__lt__(i)
%timeit -n1 -r1000 [5 < i for i in lst] # g(i) = 5 < i
%timeit -n1 -r1000 [1 for _ in lst] # h(i) = 1

In Python 3.6 I get:

43.5 ms ± 1.79 ms per loop (mean ± std. dev. of 1000 runs, 1 loop each)
82.2 ms ± 2.39 ms per loop (mean ± std. dev. of 1000 runs, 1 loop each)
43.6 ms ± 1.64 ms per loop (mean ± std. dev. of 1000 runs, 1 loop each)
23.8 ms ± 1.27 ms per loop (mean ± std. dev. of 1000 runs, 1 loop each)

In Python 3.5 I get:

1 loop, best of 1000: 43.7 ms per loop
1 loop, best of 1000: 78.9 ms per loop
1 loop, best of 1000: 46 ms per loop
1 loop, best of 1000: 26.8 ms per loop

In my opinion this shows that list comprehension is slightly faster in 3.6 than in 3.5 except when f is used. Therefore, it is difficult to conclude that it is the map that is slower in Python 3.6 or is first timeit above is slower because of the call to f being slower. Therefore I performed two more tests:

%timeit -n1 -r1000 list(map(abs, lst))
%timeit -n1 -r1000 [abs(i) for i in lst]
%timeit -n1000000 -r1000 f(1)

In Python 3.6 I get:

25.8 ms ± 1.42 ms per loop (mean ± std. dev. of 1000 runs, 1 loop each)
67.1 ms ± 2.07 ms per loop (mean ± std. dev. of 1000 runs, 1 loop each)
64.7 ns ± 2.22 ns per loop (mean ± std. dev. of 1000 runs, 1000000 loops each)

In Python 3.5 I get:

1 loop, best of 1000: 38.3 ms per loop
1 loop, best of 1000: 56.4 ms per loop
1000000 loops, best of 1000: 59.6 ns per loop

This shows that map can be significantly faster than list comprehension for some functions: specifically, for abs(x) relative performance of map versus "list comprehension" in Python 3.6 is 67.1/25.8 = 2.60 while in Python 3.5 it is 56.4/38.3 = 1.47. Therefore it is interesting to know why @MSeifert test shows that map is slower in Python 3.6. My last test above shows timing test for f(1) "alone". I am not sure how valid this test is (unfortunately) - I wanted to avoid using map or [for] to eliminate one variable - but it shows that in Python 3.6 f = (5).__lt__ became slower than in Python 3.5. Therefore I conclude that it is the particular form of the function f ((5).__lt__) whose evaluation slowed and not the map function. I know this last "alone" test is likely a bad test, however, the fact that map is very fast (relatively or absolutely) when used with abs shows that the problem is in f and not in map.

NOTE: Python 3.5 uses IPython 5.3.0 and Python 3.6 uses IPython 6.1.0.

查看更多
登录 后发表回答