According to this answer lists perform better than generators in a number of cases, for example when used together with str.join
(since the algorithm needs to pass over the data twice).
In the following example using a list comprehension seems to yield better performance than using a corresponding generator expression though intuitively the list comprehension comes with an overhead of allocating and copying to additional memory which the generator sidesteps.
In [1]: l = list(range(2_000_000))
In [2]: %timeit l[:] = [i*3 for i in range(len(l))]
190 ms ± 4.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [3]: %timeit l[:] = (i*3 for i in range(len(l)))
261 ms ± 7.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [4]: %timeit l[::2] = [i*3 for i in range(len(l)//2)]
97.1 ms ± 2.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [5]: %timeit l[::2] = (i*3 for i in range(len(l)//2))
129 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [6]: %timeit l[:len(l)//2] = [i*3 for i in range(len(l)//2)]
92.6 ms ± 2.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [7]: %timeit l[:len(l)//2] = (i*3 for i in range(len(l)//2))
118 ms ± 2.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Why does a list comprehension yield better performance in these cases?
This answer concerns CPython implementation only. Using a list comprehension is faster, since the generator is first converted into a list anyway. This is done because the length of the sequence should be determined before proceeding to replace data, and a generator can't tell you its length.
For list slice assignment, this operation is handled by the amusingly named list_ass_slice
. There is a special-case handling for assigning a list or tuple, here - they can use PySequence_Fast
ops.
This is the v3.7.4 implementation of PySequence_Fast
, where you can clearly see a type-check for list or tuples:
PyObject *
PySequence_Fast(PyObject *v, const char *m)
{
PyObject *it;
if (v == NULL) {
return null_error();
}
if (PyList_CheckExact(v) || PyTuple_CheckExact(v)) {
Py_INCREF(v);
return v;
}
it = PyObject_GetIter(v);
if (it == NULL) {
if (PyErr_ExceptionMatches(PyExc_TypeError))
PyErr_SetString(PyExc_TypeError, m);
return NULL;
}
v = PySequence_List(it);
Py_DECREF(it);
return v;
}
A generator expression will fail this type check and continue to the fallback code, where it is converted into a list object, so that the length can be predetermined.
In the general case, a predetermined length is desirable in order to allow efficient allocation of list storage, and also to provide useful error messages with extended slice assignment:
>>> vals = (x for x in 'abc')
>>> L = [1,2,3]
>>> L[::2] = vals # attempt assigning 3 values into 2 positions
---------------------------------------------------------------------------
Traceback (most recent call last)
...
ValueError: attempt to assign sequence of size 3 to extended slice of size 2
>>> L # data unchanged
[1, 2, 3]
>>> list(vals) # generator was fully consumed
[]