I ran across this case of UnboundLocalError
recently, which seems strange:
import pprint
def main():
if 'pprint' in globals(): print 'pprint is in globals()'
pprint.pprint('Spam')
from pprint import pprint
pprint('Eggs')
if __name__ == '__main__': main()
Which produces:
pprint is in globals()
Traceback (most recent call last):
File "weird.py", line 9, in <module>
if __name__ == '__main__': main()
File "weird.py", line 5, in main
pprint.pprint('Spam')
UnboundLocalError: local variable 'pprint' referenced before assignment
pprint
is clearly bound in globals
, and is going to be bound in locals
in the following statement. Can someone offer an explanation of why it isn't happy resolving pprint
to the binding in globals
here?
Edit: Thanks to the good responses I can clarify my question with relevant terminology:
At compile time the identifier pprint
is marked as local to the frame. Does the execution model have no distinction where within the frame the local identifier is bound? Can it say, "refer to the global binding up until this bytecode instruction, at which point it has been rebound to a local binding," or does the execution model not account for this?
Looks like Python sees the from pprint import pprint
line and marks pprint
as a name local to main()
before executing any code. Since Python thinks pprint ought to be a local variable, referencing it with pprint.pprint()
before "assigning" it with the from..import
statement, it throws that error.
That's as much sense as I can make of that.
The moral, of course, is to always put those import
statements at the top of the scope.
Where's the surprise? Any variable global to a scope that you reassign within that scope is marked local to that scope by the compiler.
If imports would be handled differently, that would be surprising imho.
It may make a case for not naming modules after symbols used therein, or vice versa, though.
Well, that was interesting enough for me to experiment a bit and I read through http://docs.python.org/reference/executionmodel.html
Then did some tinkering with your code here and there, this is what i could find:
code:
import pprint
def two():
from pprint import pprint
print globals()['pprint']
pprint('Eggs')
print globals()['pprint']
def main():
if 'pprint' in globals():
print 'pprint is in globals()'
global pprint
print globals()['pprint']
pprint.pprint('Spam')
from pprint import pprint
print globals()['pprint']
pprint('Eggs')
def three():
print globals()['pprint']
pprint.pprint('Spam')
if __name__ == '__main__':
two()
print('\n')
three()
print('\n')
main()
output:
<module 'pprint' from '/usr/lib/python2.5/pprint.pyc'>
'Eggs'
<module 'pprint' from '/usr/lib/python2.5/pprint.pyc'>
<module 'pprint' from '/usr/lib/python2.5/pprint.pyc'>
'Spam'
pprint is in globals()
<module 'pprint' from '/usr/lib/python2.5/pprint.pyc'>
'Spam'
<function pprint at 0xb7d596f4>
'Eggs'
In the method two()
from pprint import pprint
but does not override the name pprint
in globals
, since the global
keyword is not used in the scope of two()
.
In method three()
since there is no declaration of pprint
name in local scope it defaults to the global name pprint
which is a module
Whereas in main()
, at first the keyword global
is used so all references to pprint
in the scope of method main()
will refer to the global
name pprint
. Which as we can see is a module at first and is overriden in the global
namespace
with a method as we do the from pprint import pprint
Though this may not be answering the question as such, but nevertheless its some interesting fact I think.
=====================
Edit Another interesting thing.
If you have a module say:
mod1
from datetime import datetime
def foo():
print "bar"
and another method say:
mod2
import datetime
from mod1 import *
if __name__ == '__main__':
print datetime.datetime.now()
which at first sight is seemingly correct since you have imported the module datetime
in mod2
.
now if you try to run mod2 as a script it will throw an error:
Traceback (most recent call last):
File "mod2.py", line 5, in <module>
print datetime.datetime.now()
AttributeError: type object 'datetime.datetime' has no attribute 'datetime'
because the second import from mod2 import *
has overriden the name datetime
in the namespace, hence the first import datetime
is not valid anymore.
Moral: Thus the order of imports, the nature of imports (from x import *) and the awareness of imports within imported modules - matters.
This question got answered several weeks ago, but I think I can clarify the answers a little. First some facts.
1: In Python,
import foo
is almost exactly the same as
foo = __import__("foo", globals(), locals(), [], -1)
2: When executing code in a function, if Python encounters a variable that hasn't been defined in the function yet, it looks in the global scope.
3: Python has an optimization it uses for functions called "locals". When Python tokenizes a function, it keeps track of all the variables you assign to. It assigns each of these variables a number from a local monotonically increasing integer. When Python runs the function, it creates an array with as many slots as there are local variables, and it assigns each slot a special value that means "has not been assigned to yet", and that's where the values for those variables are stored. If you reference a local that hasn't been assigned to yet, Python sees that special value and throws an UnboundLocalValue exception.
The stage is now set. Your "from pprint import pprint" is really a form of assignment. So Python creates a local variable called "pprint" which occludes the global variable. Then, when you refer to "pprint.pprint" in the function, you hit the special value and Python throws the exception. If you didn't have that import statement in the function, Python would use the normal look-in-locals-first-then-look-in-globals resolution and find the pprint module in globals.
To disambiguate this you can use the "global" keyword. Of course by now you've already worked past your problem, and I don't know whether you really needed "global" or if some other approach was called for.