def main():
for i in xrange(10**8):
pass
main()
This piece of code in Python runs in (Note: The timing is done with the time function in BASH in Linux.)
real 0m1.841s
user 0m1.828s
sys 0m0.012s
However, if the for loop isn't placed within a function,
for i in xrange(10**8):
pass
then it runs for a much longer time:
real 0m4.543s
user 0m4.524s
sys 0m0.012s
Why is this?
You might ask why it is faster to store local variables than globals. This is a CPython implementation detail.
Remember that CPython is compiled to bytecode, which the interpreter runs. When a function is compiled, the local variables are stored in a fixed-size array (not a
dict
) and variable names are assigned to indexes. This is possible because you can't dynamically add local variables to a function. Then retrieving a local variable is literally a pointer lookup into the list and a refcount increase on thePyObject
which is trivial.Contrast this to a global lookup (
LOAD_GLOBAL
), which is a truedict
search involving a hash and so on. Incidentally, this is why you need to specifyglobal i
if you want it to be global: if you ever assign to a variable inside a scope, the compiler will issueSTORE_FAST
s for its access unless you tell it not to.By the way, global lookups are still pretty optimised. Attribute lookups
foo.bar
are the really slow ones!Here is small illustration on local variable efficiency.
Inside a function, the bytecode is
At top level, the bytecode is
The difference is that
STORE_FAST
is faster (!) thanSTORE_NAME
. This is because in a function,i
is a local but at toplevel it is a global.To examine bytecode, use the
dis
module. I was able to disassemble the function directly, but to disassemble the toplevel code I had to use thecompile
builtin.Aside from local/global variable store times, opcode prediction makes the function faster.
As the other answers explain, the function uses the
STORE_FAST
opcode in the loop. Here's the bytecode for the function's loop:Normally when a program is run, Python executes each opcode one after the other, keeping track of the a stack and preforming other checks on the stack frame after each opcode is executed. Opcode prediction means that in certain cases Python is able to jump directly to the next opcode, thus avoiding some of this overhead.
In this case, every time Python sees
FOR_ITER
(the top of the loop), it will "predict" thatSTORE_FAST
is the next opcode it has to execute. Python then peeks at the next opcode and, if the prediction was correct, it jumps straight toSTORE_FAST
. This has the effect of squeezing the two opcodes into a single opcode.On the other hand, the
STORE_NAME
opcode is used in the loop at the global level. Python does *not* make similar predictions when it sees this opcode. Instead, it must go back to the top of the evaluation-loop which has obvious implications for the speed at which the loop is executed.To give some more technical detail about this optimization, here's a quote from the
ceval.c
file (the "engine" of Python's virtual machine):We can see in the source code for the
FOR_ITER
opcode exactly where the prediction forSTORE_FAST
is made:The
PREDICT
function expands toif (*next_instr == op) goto PRED_##op
i.e. we just jump to the start of the predicted opcode. In this case, we jump here:The local variable is now set and the next opcode is up for execution. Python continues through the iterable until it reaches the end, making the successful prediction each time.
The Python wiki page has more information about how CPython's virtual machine works.