I am trying to analyze some messy code, that happens to use global variables quite heavily within functions (I am trying to refactor the code so that functions only use local variables). Is there any way to detect global variables within a function?
For example:
def f(x):
x = x + 1
z = x + y
return z
Here the global variable is y
since it isn't given as an argument, and neither is it created within the function.
I tried to detect global variables within the function using string parsing, but it was getting a bit messy; I was wondering if there was a better way to do this?
Edit: If anyone is interested this is the code I am using to detect global variables (based on kindall's answer and Paolo's answer to this question: Capture stdout from a script in Python):
from dis import dis
def capture(f):
"""
Decorator to capture standard output
"""
def captured(*args, **kwargs):
import sys
from cStringIO import StringIO
# setup the environment
backup = sys.stdout
try:
sys.stdout = StringIO() # capture output
f(*args, **kwargs)
out = sys.stdout.getvalue() # release output
finally:
sys.stdout.close() # close the stream
sys.stdout = backup # restore original stdout
return out # captured output wrapped in a string
return captured
def return_globals(f):
"""
Prints all of the global variables in function f
"""
x = dis_(f)
for i in x.splitlines():
if "LOAD_GLOBAL" in i:
print i
dis_ = capture(dis)
dis_(f)
dis
by default does not return output, so if you want to manipulate the output of dis
as a string, you have to use the capture decorator written by Paolo and posted here: Capture stdout from a script in Python
Inspect the bytecode.
from dis import dis
dis(f)
Result:
2 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (1)
6 BINARY_ADD
7 STORE_FAST 0 (x)
3 10 LOAD_FAST 0 (x)
13 LOAD_GLOBAL 0 (y)
16 BINARY_ADD
17 STORE_FAST 1 (z)
4 20 LOAD_FAST 1 (z)
23 RETURN_VALUE
The global variables will have a LOAD_GLOBAL
opcode instead of LOAD_FAST
. (If the function changes any global variables, there will be STORE_GLOBAL
opcodes as well.)
With a little work, you could even write a function that scans the bytecode of a function and returns a list of the global variables it uses. In fact:
from dis import HAVE_ARGUMENT, opmap
def getglobals(func):
GLOBAL_OPS = opmap["LOAD_GLOBAL"], opmap["STORE_GLOBAL"]
EXTENDED_ARG = opmap["EXTENDED_ARG"]
func = getattr(func, "im_func", func)
code = func.func_code
names = code.co_names
op = (ord(c) for c in code.co_code)
globs = set()
extarg = 0
for c in op:
if c in GLOBAL_OPS:
globs.add(names[next(op) + next(op) * 256 + extarg])
elif c == EXTENDED_ARG:
extarg = (next(op) + next(op) * 256) * 65536
continue
elif c >= HAVE_ARGUMENT:
next(op)
next(op)
extarg = 0
return sorted(globs)
print getglobals(f) # ['y']
As mentioned in the LOAD_GLOBAL
documentation:
LOAD_GLOBAL(namei)
Loads the global named co_names[namei]
onto the stack.
This means you can inspect the code object for your function to find globals:
>>> f.__code__.co_names
('y',)
Note that this isn't sufficient for nested functions (nor is the dis.dis
method in @kindall's answer). In that case, you will need to look at constants too:
# Define a function containing a nested function
>>> def foo():
... def bar():
... return some_global
# It doesn't contain LOAD_GLOBAL, so .co_names is empty.
>>> dis.dis(foo)
2 0 LOAD_CONST 1 (<code object bar at 0x2b70440c84b0, file "<ipython-input-106-77ead3dc3fb7>", line 2>)
3 MAKE_FUNCTION 0
6 STORE_FAST 0 (bar)
9 LOAD_CONST 0 (None)
12 RETURN_VALUE
# Instead, we need to walk the constants to find nested functions:
# (if bar contain a nested function too, we'd need to recurse)
>>> from types import CodeType
>>> for constant in foo.__code__.co_consts:
... if isinstance(constant, CodeType):
... print constant.co_names
('some_global',)