The issue of the performance of the python random module, and in particular, random.sample
and random.shuffle
came up in this question. On my computer, I get the following results:
> python -m timeit -s 'import random' 'random.randint(0,1000)'
1000000 loops, best of 3: 1.07 usec per loop
> python3 -m timeit -s 'import random' 'random.randint(0,1000)'
1000000 loops, best of 3: 1.3 usec per loop
That's more than a 20% degradation of performance in python3 vs python2. It gets much worse.
> python -m timeit -s 'import random' 'random.shuffle(list(range(10)))'
100000 loops, best of 3: 3.85 usec per loop
> python3 -m timeit -s 'import random' 'random.shuffle(list(range(10)))'
100000 loops, best of 3: 8.04 usec per loop
> python -m timeit -s 'import random' 'random.sample(range(10),3)'
100000 loops, best of 3: 2.4 usec per loop
> python3 -m timeit -s 'import random' 'random.sample(range(10),3)'
100000 loops, best of 3: 6.49 usec per loop
That represents a 100% degradation in performance for random.shuffle
, and almost a 200% degradation for random.sample
. That is quite severe.
I used python 2.7.9 and python 3.4.2 in the above tests.
My question: What happened to the random
module in python3?
----------- What Changed -----------------------------------------------
Several things happened:
Integers became less efficient in the int/long unification. That is also why integers are 28 bytes wide now instead of 24 bytes on 64-bit Linux/MacOS builds.
Shuffle became more accurate by using
_randbelow
. This eliminated a subtle bias in the previous algorithm.Indexing became slower because the special case for integer indices was removed from ceval.c primarily because it was harder to do with the newer integers and because a couple of the core devs didn't think the optimization was worth it.
The range() function was replaced with xrange(). This is relevant because the OP's timings both use range() in the inner-loop.
The algorithms for shuffle() and sample() were otherwise unchanged.
Python 3 made a number of changes like unicode-everywhere that made the internals more complex, a little slower, and more memory intensive. In return, Python 3 makes it easier for users to write correct code.
Likewise, int/long unification made the language simpler but at a cost of speed and space. The switch to using
_randbelow()
in the random module had a runtime cost but benefited in terms of accuracy and correctness.----------- Conclusion --------------------------------------------------
In short, Python 3 is better in some ways that matter to many users and worse in some ways that people rarely notice. Engineering is often about trade-offs.
----------- Details ---------------------------------------------------------
Python2.7 code for shuffle():
Python3.6 code for shuffle():
Python 2.7 integer size:
Python 3.6 integer size:
Relative speed of indexed lookups (binary subscriptions with integer arguments indexing into a list):
Python 2.7 code in ceval.c with an optimization for indexed lookups:
Python 3.6 code in ceval.c without the optimization for indexed lookups: