This message is a a bit long with many examples, but I hope it will help me and others to better grasp the full story of variables and attribute lookup in Python 2.7.
I am using the terms of PEP 227 (http://www.python.org/dev/peps/pep-0227/) for code blocks (such as modules, class definition, function definitions, etc.) and variable bindings (such as assignments, argument declarations, class and function declaration, for loops, etc.)
I am using the terms variables for names that can be called without a dot, and attributes for names that need to be qualified with an object name (such as obj.x for the attribute x of object obj).
There are three scopes in Python for all code blocks, but the functions:
- Local
- Global
- Builtin
There are four blocks in Python for the functions only (according to PEP 227):
- Local
- Enclosing functions
- Global
- Builtin
The rule for a variable to bind it to and find it in a block is quite simple:
- any binding of a variable to an object in a block makes this variable local to this block, unless the variable is declared global (in that case the variable belongs to the global scope)
- a reference to a variable is looked up using the rule LGB (local, global, builtin) for all blocks, but the functions
- a reference to a variable is looked up using the rule LEGB (local, enclosing, global, builtin) for the functions only.
Let me know take examples validating this rule, and showing many special cases. For each example, I will give my understanding. Please correct me if I am wrong. For the last example, I don't understand the outcome.
example 1:
x = "x in module"
class A():
print "A: " + x #x in module
x = "x in class A"
print locals()
class B():
print "B: " + x #x in module
x = "x in class B"
print locals()
def f(self):
print "f: " + x #x in module
self.x = "self.x in f"
print x, self.x
print locals()
>>>A.B().f()
A: x in module
{'x': 'x in class A', '__module__': '__main__'}
B: x in module
{'x': 'x in class B', '__module__': '__main__'}
f: x in module
x in module self.x in f
{'self': <__main__.B instance at 0x00000000026FC9C8>}
There is no nested scope for the classes (rule LGB) and a function in a class cannot access the attributes of the class without using a qualified name (self.x in this example). This is well described in PEP227.
example 2:
z = "z in module"
def f():
z = "z in f()"
class C():
z = "z in C"
def g(self):
print z
print C.z
C().g()
f()
>>>
z in f()
z in C
Here variables in functions are looked up using the LEGB rule, but if a class is in the path, the class arguments are skipped. Here again, this is what PEP 227 is explaining.
example 3:
var = 0
def func():
print var
var = 1
>>> func()
Traceback (most recent call last):
File "<pyshell#102>", line 1, in <module>
func()
File "C:/Users/aa/Desktop/test2.py", line 25, in func
print var
UnboundLocalError: local variable 'var' referenced before assignment
We expect with a dynamic language such as python that everything is resolved dynamically. But this is not the case for functions. Local variables are determined at compile time. PEP 227 and http://docs.python.org/2.7/reference/executionmodel.html describe this behavior this way
"If a name binding operation occurs anywhere within a code block, all uses of the name within the block are treated as references to the current block."
example 4:
x = "x in module"
class A():
print "A: " + x
x = "x in A"
print "A: " + x
print locals()
del x
print locals()
print "A: " + x
>>>
A: x in module
A: x in A
{'x': 'x in A', '__module__': '__main__'}
{'__module__': '__main__'}
A: x in module
But we see here that this statement in PEP227 "If a name binding operation occurs anywhere within a code block, all uses of the name within the block are treated as references to the current block." is wrong when the code block is a class. Moreover, for classes, it seems that local name binding is not made at compile time, but during execution using the class namespace. In that respect, PEP227 and the execution model in the Python doc is misleading and for some parts wrong.
example 5:
x = 'x in module'
def f2():
x = 'x in f2'
def myfunc():
x = 'x in myfunc'
class MyClass(object):
x = x
print x
return MyClass
myfunc()
f2()
>>>
x in module
my understanding of this code is the following. The instruction x = x first look up the object the right hand x of the expression is referring to. In that case, the object is looked up locally in the class, then following the rule LGB it is looked up in the global scope, which is the string 'x in module'. Then a local attribute x to MyClass is created in the class dictionary and pointed to the string object.
example 6:
Now here is an example I cannot explain. It is very close to example 5, I am just changing the local MyClass attribute from x to y.
x = 'x in module'
def f2():
x = 'x in f2'
def myfunc():
x = 'x in myfunc'
class MyClass(object):
y = x
print y
return MyClass
myfunc()
f2()
>>>
x in myfunc
Why in that case the x reference in MyClass is looked up in the innermost function?
In two words, the difference between example 5 and example 6 is that in example 5 the variable
x
is also assigned to in the same scope, while not in example 6. This triggers a difference that can be understood by historical reasons.This raises UnboundLocalError:
instead of printing "foo". It makes a bit of sense, even if it seems strange at first: the function f() defines the variable
x
locally, even if it is after the print, and so any reference tox
in the same function must be to that local variable. At least it makes sense in that it avoids strange surprizes if you have by mistake reused the name of a global variable locally, and are trying to use both the global and the local variable. This is a good idea because it means that we can statically know, just by looking at a variable, which variable it means. For example, we know thatprint x
refers to the local variable (and thus may raise UnboundLocalError) here:Now, this rule doesn't work for class-level scopes: there, we want expressions like
x = x
to work, capturing the global variablex
into the class-level scope. This means that class-level scopes don't follow the basic rule above: we can't know ifx
in this scope refers to some outer variable or to the locally-definedx
--- for example:So in class scopes, a different rule is used: where it would normally raise UnboundLocalError --- and only in that case --- it instead looks up in the module globals. That's all: it doesn't follow the chain of nested scopes.
Why not? I actually doubt there is a better explanation that "for historical reasons". In more technical terms, it could consider that the variable
x
is both locally defined in the class scope (because it is assigned to) and should be passed in from the parent scope as a lexically nested variable (because it is read). It would be possible to implement it by using a different bytecode thanLOAD_NAME
that looks up in the local scope, and falls back to using the nested scope's reference if not found.EDIT: thanks wilberforce for the reference to http://bugs.python.org/issue532860. We may have a chance to get some discussion reactivated with the proposed new bytecode, if we feel that it should be fixed after all (the bug report considers killing support for
x = x
but was closed for fear of breaking too much existing code; instead what I'm suggesting here would be to makex = x
work in more cases). Or I may be missing another fine point...EDIT2: it seems that CPython did precisely that in the current 3.4 trunk: http://bugs.python.org/issue17853 ... or not? They introduced the bytecode for a slightly different reason and don't use it systematically...
In an ideal world, you'd be right and some of the inconsistencies you found would be wrong. However, CPython has optimized some scenarios, specifically function locals. These optimizations, together with how the compiler and evaluation loop interact and historical precedent, lead to the confusion.
Python translates code to bytecodes, and those are then interpreted by a interpreter loop. The 'regular' opcode for accessing a name is
LOAD_NAME
, which looks up a variable name as you would in a dictionary.LOAD_NAME
will first look up a name as a local, and if that fails, looks for a global.LOAD_NAME
throws aNameError
exception when the name is not found.For nested scopes, looking up names outside of the current scope is implemented using closures; if a name is not assigned to but is available in a nested (not global) scope, then such values are handled as a closure. This is needed because a parent scope can hold different values for a given name at different times; two calls to a parent function can lead to different closure values. So Python has
LOAD_CLOSURE
,MAKE_CLOSURE
andLOAD_DEREF
opcodes for that situation; the first two opcodes are used in loading and creating a closure for a nested scope, and theLOAD_DEREF
will load the closed-over value when the nested scope needs it.Now,
LOAD_NAME
is relatively slow; it will consult two dictionaries, which means it has to hash the key first and run a few equality tests (if the name wasn't interned). If the name isn't local, then it has to do this again for a global. For functions, that can potentially be called tens of thousands of times, this can get tedious fast. So function locals have special opcodes. Loading a local name is implemented byLOAD_FAST
, which looks up local variables by index in a special local names array. This is much faster, but it does require that the compiler first has to see if a name is a local and not global. To still be able to look up global names, another opcodeLOAD_GLOBAL
is used. The compiler explicitly optimizes for this case to generate the special opcodes.LOAD_FAST
will throw anUnboundLocalError
exception when there is not yet a value for the name.Class definition bodies on the other hand, although they are treated much like a function, do not get this optimization step. Class definitions are not meant to be called all that often; most modules create classes once, when imported. Class scopes don't count when nesting either, so the rules are simpler. As a result, class definition bodies do not act like functions when you start mixing scopes up a little.
So, for non-function scopes,
LOAD_NAME
andLOAD_DEREF
are used for locals and globals, and for closures, respectively. For functions,LOAD_FAST
,LOAD_GLOBAL
andLOAD_DEREF
are used instead.Note that class bodies are executed as soon as Python executes the
class
line! So in example 1,class B
insideclass A
is executed as soon asclass A
is executed, which is when you import the module. In example 2,C
is not executed untilf()
is called, not before.Lets walk through your examples:
You have nested a class
A.B
in a classA
. Class bodies do not form nested scopes, so even though theA.B
class body is executed when classA
is executed, the compiler will useLOAD_NAME
to look upx
.A.B().f()
is a function (bound to theB()
instance as a method), so it usesLOAD_GLOBAL
to loadx
. We'll ignore attribute access here, that's a very well defined name pattern.Here
f().C.z
is at class scope, so the functionf().C().g()
will skip theC
scope and look at thef()
scope instead, usingLOAD_DEREF
.Here
var
was determined to be a local by the compiler because you assign to it within the scope. Functions are optimized, soLOAD_FAST
is used to look up the local and an exception is thrown.Now things get a little weird.
class A
is executed at class scope, soLOAD_NAME
is being used.A.x
was deleted from the locals dictionary for the scope, so the second access tox
results in the globalx
being found instead;LOAD_NAME
looked for a local first and didn't find it there, falling back to the global lookup.Yes, this appears inconsistent with the documentation. Python-the-language and CPython-the implementation are clashing a little here. You are, however, pushing the boundaries of what is possible and practical in a dynamic language; checking if
x
should have been a local inLOAD_NAME
would be possible but takes precious execution time for a corner case that most developers will never run into.Now you are confusing the compiler. You used
x = x
in the class scope, and thus you are setting a local from a name outside of the scope. The compiler findsx
is a local here (you assign to it), so it never considers that it could also be a scoped name. The compiler usesLOAD_NAME
for all references tox
in this scope, because this is not an optimized function body.When executing the class definition,
x = x
first requires you to look upx
, so it usesLOAD_NAME
to do so. Nox
is defined,LOAD_NAME
doesn't find a local, so the globalx
is found. The resulting value is stored as a local, which happens to be namedx
as well.print x
usesLOAD_NAME
again, and now finds the new localx
value.Here you did not confuse the compiler. You are creating a local
y
,x
is not local, so the compiler recognizes it as a scoped name from parent functionf2().myfunc()
.x
is looked up withLOAD_DEREF
from the closure, and stored iny
.You could see the confusion between 5 and 6 as a bug, albeit one that is not worth fixing in my opinion. It was certainly filed as such, see issue 532860 in the Python bug tracker, it has been there for over 10 years now.
The compiler could check for a scoped name
x
even whenx
is also a local, for that first assignment in example 5. OrLOAD_NAME
could check if the name is meant to be a local, really, and throw anUnboundLocalError
if no local was found, at the expense of more performance. Had this been in a function scope,LOAD_FAST
would have been used for example 5, and anUnboundLocalError
would be thrown immediately.However, as the referenced bug shows, for historical reasons the behaviour is retained. There probably is code out there today that'll break were this bug fixed.
Long story short, this is a corner case of Python's scoping that is a bit inconsistent, but has to be kept for backwards compatibility (and because it's not that clear what the right answer should be). You can see lots of the original discussion about it on the Python mailing list when PEP 227 was being implemented, and some in the bug for which this behaviour is the fix.
We can work out why there's a difference using the
dis
module, which lets us look inside code objects to see the bytecode a piece of code has been compiled to. I'm on Python 2.6, so the details of this might be slightly different - but I see the same behaviour, so I think it's probably close enough to 2.7.The code that initialises each nested
MyClass
lives in a code object that you can get to via the attributes of the top-level functions. (I'm renaming the functions from example 5 and example 6 tof1
andf2
respectively.)The code object has a
co_consts
tuple, which contains themyfunc
code object, which in turn has the code that runs whenMyClass
gets created:Then you can see the difference between them in bytecode using
dis.dis
:So the only difference is that in
MyClass1
,x
is loaded using theLOAD_NAME
op, while inMyClass2
, it's loaded usingLOAD_DEREF
.LOAD_DEREF
looks up a name in an enclosing scope, so it gets 'x in myfunc'.LOAD_NAME
doesn't follow nested scopes - since it can't see thex
names bound inmyfunc
orf1
, it gets the module-level binding.Then the question is, why does the code of the two versions of
MyClass
get compiled to two different opcodes? Inf1
the binding is shadowingx
in the class scope, while inf2
it's binding a new name. If theMyClass
scopes were nested functions instead of classes, they = x
line inf2
would be compiled the same, but thex = x
inf1
would be aLOAD_FAST
- this is because the compiler would know thatx
is bound in the function, so it should use theLOAD_FAST
to retrieve a local variable. This would fail with anUnboundLocalError
when it was called.This fails because the
MyFunc
function then usesLOAD_FAST
:(As an aside, it's not a big surprise that there should be a difference in how scoping interacts with code in the body of classes and code in a function. You can tell this because bindings at the class level aren't available in methods - method scopes aren't nested inside the class scope in the same way as nested functions are. You have to explicitly reach them via the class, or by using
self.
(which will fall back to the class if there's not also an instance-level binding).)