I don't come from a software/computer science background but I love to code in Python and can generally understand why things are faster. I am really curious to know why this for loop runs faster than the dictionary comprehension. Any insights?
Problem : Given a dictionary
a
with these keys and values, return a dictionary with the values as keys and the keys as values. (challenge: do this in one line)
and the code
a = {'a':'hi','b':'hey','c':'yo'}
b = {}
for i,j in a.items():
b[j]=i
%% timeit 932 ns ± 37.2 ns per loop
b = {v: k for k, v in a.items()}
%% timeit 1.08 µs ± 16.4 ns per loop
You are testing with way too small an input; while a dictionary comprehension doesn't have as much of a performance advantage against a
for
loop when compared to a list comprehension, for realistic problem sizes it can and does beatfor
loops, especially when targeting a global name.Your input consists of just 3 key-value pairs. Testing with 1000 elements instead, we see that the timings are very close instead:
The difference is there, the dict comp is faster but only just at this scale. With 100 times as many key-value pairs the difference is a bit bigger:
which is not that big a difference when you consider both processed nearly 100k key-value pairs. Still, the
for
loop is clearly slower.So why the speed difference with 3 elements? Because a comprehension (dictionary, set, list comprehensions or a generator expression) is under the hood implemented as a new function, and calling that function has a base cost the plain loop doesn't have to pay.
Here's the disassembly for the bytecode for both alternatives; note the
MAKE_FUNCTION
andCALL_FUNCTION
opcodes in the top-level bytecode for the dict comprehension, there is a separate section for what that function then does, and there are actually very few differences in between the two approaches here:The material differences: the looped code uses
LOAD_NAME
forb
each iteration, andSTORE_SUBSCR
to store the key-value pair in dict loaded. The dictionary comprehension usesMAP_ADD
to achieve the same thing asSTORE_SUBSCR
but doesn't have to load thatb
name each time.But with 3 iterations only, the
MAKE_FUNCTION
/CALL_FUNCTION
combo the dict comprehension has to execute is the real drag on the performance:More than 0.1 μs to create a function object with one argument, and call it (with an extra
LOAD_CONST
for theNone
value we pass in)! And that's just about the difference between the looped and comprehension timings for 3 key-value pairs.You can liken this to being surprised that a man with a shovel can dig a small hole faster than a backhoe can. The backhoe can certainly dig fast, but a man with a shovel can get started faster if you need to get the backhoe started and moved into position first!
Beyond a few key-value pairs (digging a bigger hole), the function create and call cost fades away into nothingness. At this point the dict comprehension and the explicit loop basically do the same thing:
dict.__setitem__
hook via a bytecode operation with the top two items on the stack (eitherSTORE_SUBSCR
orMAP_ADD
. This doesn't count as a 'function call' as it's all internally handled in the interpreter loop.This is different from a list comprehension, where the plain loop version would have to use
list.append()
, involving an attribute lookup, and a function call each loop iteration. The list comprehension speed advantage comes from this difference; see Python list comprehension expensiveWhat a dict comprehension does add, is that the target dictionary name only needs to be looked up once, when binding
b
to the the final dictionary object. If the target dictionary is a global instead of a local variable, the comprehension wins, hands down:So just use a dict comprehension. The difference with < 30 elements to process is too small to care about, and the moment you are generating a global or have more items, the dict comprehension wins out anyway.
This question, in some senses, is quite similar to Why is a list comprehension so much faster than appending to a list? which I've answered a long while ago. However, the reason that this behavior is surprising to you is obviously because your dictionary is way too small to overcome the cost of creating a new function frame and pushing/pulling it in stack. To understand that better let's go under the skin of tow snippets you have:
On second disassembled code (dict comprehension) you have a
MAKE_FUNCTION
opcode which as it's also stated in documentation pushes a new function object on the stack. and laterCALL_FUNCTION
which Calls a callable object with positional arguments. and then:All these operations have their costs but when the dictionary gets larger the cost of assigning the key-value items to the dictionary will become larger than creating a function under the hood. In other words cost of calling the
__setitem__
method of the dictionary from a certain point will exceed the cost of creating and suspending a dictionary object on the fly.Also, note that certainly there are multiple other operations (OP_CODES in this case) that play a crucial role in this game which I think worth investigating and considering which I'm gonna live it to you as a practice ;).