python 2 vs python 3 performance of random, partic

2019-03-19 03:01发布

问题:

The issue of the performance of the python random module, and in particular, random.sample and random.shuffle came up in this question. On my computer, I get the following results:

> python  -m timeit -s 'import random' 'random.randint(0,1000)'
1000000 loops, best of 3: 1.07 usec per loop
> python3 -m timeit -s 'import random' 'random.randint(0,1000)'
1000000 loops, best of 3: 1.3 usec per loop

That's more than a 20% degradation of performance in python3 vs python2. It gets much worse.

> python  -m timeit -s 'import random' 'random.shuffle(list(range(10)))'
100000 loops, best of 3: 3.85 usec per loop
> python3 -m timeit -s 'import random' 'random.shuffle(list(range(10)))'
100000 loops, best of 3: 8.04 usec per loop

> python  -m timeit -s 'import random' 'random.sample(range(10),3)'
100000 loops, best of 3: 2.4 usec per loop
> python3 -m timeit -s 'import random' 'random.sample(range(10),3)'
100000 loops, best of 3: 6.49 usec per loop

That represents a 100% degradation in performance for random.shuffle, and almost a 200% degradation for random.sample. That is quite severe.


I used python 2.7.9 and python 3.4.2 in the above tests.

My question: What happened to the random module in python3?

回答1:

----------- What Changed -----------------------------------------------

Several things happened:

  • Integers became less efficient in the int/long unification. That is also why integers are 28 bytes wide now instead of 24 bytes on 64-bit Linux/MacOS builds.

  • Shuffle became more accurate by using _randbelow. This eliminated a subtle bias in the previous algorithm.

  • Indexing became slower because the special case for integer indices was removed from ceval.c primarily because it was harder to do with the newer integers and because a couple of the core devs didn't think the optimization was worth it.

  • The range() function was replaced with xrange(). This is relevant because the OP's timings both use range() in the inner-loop.

The algorithms for shuffle() and sample() were otherwise unchanged.

Python 3 made a number of changes like unicode-everywhere that made the internals more complex, a little slower, and more memory intensive. In return, Python 3 makes it easier for users to write correct code.

Likewise, int/long unification made the language simpler but at a cost of speed and space. The switch to using _randbelow() in the random module had a runtime cost but benefited in terms of accuracy and correctness.

----------- Conclusion --------------------------------------------------

In short, Python 3 is better in some ways that matter to many users and worse in some ways that people rarely notice. Engineering is often about trade-offs.

----------- Details ---------------------------------------------------------

Python2.7 code for shuffle():

def shuffle(self, x, random=None):
    if random is None:
        random = self.random
    _int = int
    for i in reversed(xrange(1, len(x))):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = _int(random() * (i+1))
        x[i], x[j] = x[j], x[i]

Python3.6 code for shuffle():

def shuffle(self, x, random=None):
    if random is None:
        randbelow = self._randbelow
        for i in reversed(range(1, len(x))):
            # pick an element in x[:i+1] with which to exchange x[i]
            j = randbelow(i+1)              # <-- This part changed
            x[i], x[j] = x[j], x[i]
    else:
        _int = int
        for i in reversed(range(1, len(x))):
            # pick an element in x[:i+1] with which to exchange x[i]
            j = _int(random() * (i+1))
            x[i], x[j] = x[j], x[i]

Python 2.7 integer size:

>>> import sys
>>> sys.getsizeof(1)
24

Python 3.6 integer size:

>>> import sys
>>> sys.getsizeof(1)
28

Relative speed of indexed lookups (binary subscriptions with integer arguments indexing into a list):

$ python2.7 -m timeit -s 'a=[0]' 'a[0]'
10000000 loops, best of 3: 0.0253 usec per loop
$ python3.6 -m timeit -s 'a=[0]' 'a[0]'
10000000 loops, best of 3: 0.0313 usec per loop

Python 2.7 code in ceval.c with an optimization for indexed lookups:

    TARGET_NOARG(BINARY_SUBSCR)
    {
        w = POP();
        v = TOP();
        if (PyList_CheckExact(v) && PyInt_CheckExact(w)) {
            /* INLINE: list[int] */
            Py_ssize_t i = PyInt_AsSsize_t(w);
            if (i < 0)
                i += PyList_GET_SIZE(v);
            if (i >= 0 && i < PyList_GET_SIZE(v)) {
                x = PyList_GET_ITEM(v, i);
                Py_INCREF(x);
            }
            else
                goto slow_get;
        }
        else
          slow_get:
            x = PyObject_GetItem(v, w);
        Py_DECREF(v);
        Py_DECREF(w);
        SET_TOP(x);
        if (x != NULL) DISPATCH();
        break;
    }

Python 3.6 code in ceval.c without the optimization for indexed lookups:

    TARGET(BINARY_SUBSCR) {
        PyObject *sub = POP();
        PyObject *container = TOP();
        PyObject *res = PyObject_GetItem(container, sub);
        Py_DECREF(container);
        Py_DECREF(sub);
        SET_TOP(res);
        if (res == NULL)
            goto error;
        DISPATCH();
    }