I was playing around with timeit and noticed that doing a simple list comprehension over a small string took longer than doing the same operation on a list of small single character strings. Any explanation? It's almost 1.35 times as much time.
>>> from timeit import timeit
>>> timeit("[x for x in 'abc']")
2.0691067844831528
>>> timeit("[x for x in ['a', 'b', 'c']]")
1.5286479570345861
What's happening on a lower level that's causing this?
Cannot confirm the results for Python 2: In Python 2 it doesn't seem to make a difference if you iterate over strings or lists...and tuples are pretty speedy!
TL;DR
The actual speed difference is closer to 70% (or more) once a lot of the overhead is removed, for Python 2.
Object creation is not at fault. Neither method creates a new object, as one-character strings are cached.
The difference is unobvious, but is likely created from a greater number of checks on string indexing, with regards to the type and well-formedness. It is also quite likely thanks to the need to check what to return.
List indexing is remarkably fast.
This disagrees with what you've found...
You must be using Python 2, then.
Let's explain the difference between the versions. I'll examine the compiled code.
For Python 3:
You see here that the list variant is likely to be slower due to the building of the list each time.
This is the
part. The string variant only has
You can check that this does seem to make a difference:
This produces just
as tuples are immutable. Test:
Great, back up to speed.
For Python 2:
The odd thing is that we have the same building of the list, but it's still faster for this. Python 2 is acting strangely fast.
Let's remove the comprehensions and re-time. The
_ =
is to prevent it getting optimised out.We can see that initialization is not significant enough to account for the difference between the versions (those numbers are small)! We can thus conclude that Python 3 has slower comprehensions. This makes sense as Python 3 changed comprehensions to have safer scoping.
Well, now improve the benchmark (I'm just removing overhead that isn't iteration). This removes the building of the iterable by pre-assigning it:
We can check if calling
iter
is the overhead:No. No it is not. The difference is too small, especially for Python 3.
So let's remove yet more unwanted overhead... by making the whole thing slower! The aim is just to have a longer iteration so the time hides overhead.
This hasn't actually changed much, but it's helped a little.
So remove the comprehension. It's overhead that's not part of the question:
That's more like it! We can get slightly faster still by using
deque
to iterate. It's basically the same, but it's faster:What impresses me is that Unicode is competitive with bytestrings. We can check this explicitly by trying
bytes
andunicode
in both:bytes
Here you see Python 3 actually faster than Python 2.
unicode
Again, Python 3 is faster, although this is to be expected (
str
has had a lot of attention in Python 3).In fact, this
unicode
-bytes
difference is very small, which is impressive.So let's analyse this one case, seeing as it's fast and convenient for me:
We can actually rule out Tim Peter's 10-times-upvoted answer!
These are not new objects!
But this is worth mentioning: indexing costs. The difference will likely be in the indexing, so remove the iteration and just index:
The difference seems small, but at least half of the cost is overhead:
so the speed difference is sufficient to decide to blame it. I think.
So why is indexing a list so much faster?
Well, I'll come back to you on that, but my guess is that's is down to the check for interned strings (or cached characters if it's a separate mechanism). This will be less fast than optimal. But I'll go check the source (although I'm not comfortable in C...) :).
So here's the source:
Walking from the top, we'll have some checks. These are boring. Then some assigns, which should also be boring. The first interesting line is
but we'd hope that is fast, as we're reading from a contiguous C array by indexing it. The result,
ch
, will be less than 256 so we'll return the cached character inget_latin1_char(ch)
.So we'll run (dropping the first checks)
Where
(which is boring because asserts get ignored in debug [so I can check that they're fast] and
((PyASCIIObject *)(op))->state.kind)
is (I think) an indirection and a C-level cast);(which is also boring for similar reasons, assuming the macros (
Something_CAPITALIZED
) are all fast),(which involves indexes but really isn't slow at all) and
Which confirms my suspicion that:
This is cached:
This should be fast. The
if (!unicode)
is not run, so it's literally equivalent in this case toHonestly, after testing the
assert
s are fast (by disabling them [I think it works on the C-level asserts...]), the only plausibly-slow parts are:Which are:
(fast, as before),
(fast if the macro
IS_ASCII
is fast), and(also fast as it's an assert plus an indirection plus a cast).
So we're down (the rabbit hole) to:
which is
Hmm... that seems fast too...
Well, OK, but let's compare it to
PyList_GetItem
. (Yeah, thanks Tim Peters for giving me more work to do :P.)We can see that on non-error cases this is just going to run:
Where
PyList_Check
is(TABS! TABS!!!) (issue21587)That got fixed and merged in 5 minutes. Like... yeah. Damn. They put Skeet to shame.So this is normally really trivial (two indirections and a couple of boolean checks) unless
Py_LIMITED_API
is on, in which case... ???Then there's the indexing and a cast (
((PyListObject *)op) -> ob_item[i]
) and we're done.So there are definitely fewer checks for lists, and the small speed differences certainly imply that it could be relevant.
I think in general, there's just more type-checking and indirection
(->)
for Unicode. It seems I'm missing a point, but what?When you iterate over most container objects (lists, tuples, dicts, ...), the iterator delivers the objects in the container.
But when you iterate over a string, a new object has to be created for each character delivered - a string is not "a container" in the same sense a list is a container. The individual characters in a string don't exist as distinct objects before iteration creates those objects.
You could be incurring and overhead for creating the iterator for the string. Whereas the array already contains an iterator upon instantiation.
EDIT:
This was ran using 2.7, but on my mac book pro i7. This could be the result of a system configuration difference.