I have a package containing subpackages only one of which I need imported during runtime - but I need to test they are valid. Here is my folder structure:
game/
__init__.py
game1/
__init__.py
constants.py
...
game2/
__init__.py
constants.py
...
For now the code that runs on boot does:
import pkgutil
import game as _game
# Detect the known games
for importer,modname,ispkg in pkgutil.iter_modules(_game.__path__):
if not ispkg: continue # game support modules are packages
# Equivalent of "from game import <modname>"
try:
module = __import__('game',globals(),locals(),[modname],-1)
except ImportError:
deprint(u'Error in game support module:', modname, traceback=True)
continue
submod = getattr(module,modname)
if not hasattr(submod,'fsName') or not hasattr(submod,'exe'): continue
_allGames[submod.fsName.lower()] = submod
but this has the disadvantage that all the subpackages are imported, importing the other modules in the subpackage (such as the constants.py etc) which amounts to some few magabytes of garbage. So I want to substitute this code with a test that the submodules are valid (they would import fine). I guess I should be using eval somehow - but how ? Or what should I do ?
EDIT: tldr;
I am looking for an equivalent to the core of the loop above:
try:
probaly_eval(game, modname) # fails iff `from game import modname` fails
# but does _not_ import the module
except: # I'd rather have a more specific error here but methinks not possible
deprint(u'Error in game support module:', modname, traceback=True)
continue
So I want a clear answer if an exact equivalent to the import statement vis a vis error checking exists - without importing the module. That's my question, a lot of answerers and commenters answered different questions.
Maybe you're looking for the py_compile
or compileall
modules.
Here the documentation:
https://docs.python.org/2/library/py_compile.html
https://docs.python.org/2/library/compileall.html#module-compileall
You can load the one you want as a module and call it from within your program.
For example:
import py_compile
try:
py_compile.compile(your_py_file, doraise=True)
module_ok = True
except py_compile.PyCompileError:
module_ok = False
If you want to compile the file without importing it (in current interpreter), you may use py_compile.compile
as:
>>> import py_compile
# valid python file
>>> py_compile.compile('/path/to/valid/python/file.py')
# invalid python file
>>> py_compile.compile('/path/to/in-valid/python/file.txt')
Sorry: TypeError: compile() expected string without null bytes
Above code writes the error to std.error
. In case you want to raise the exception, you will have to set doraise
as True
(default False
). Hence, your code will be:
from py_compile import compile, PyCompileError
try:
compile('/path/to/valid/python/file.py', doraise=True)
valid_file = True
except PyCompileError:
valid_file = False
As per the py_compile.compile
's documents:
Compile a source file to byte-code and write out the byte-code cache file. The source code is loaded from the file named file. The byte-code is written to cfile
, which defaults to file + 'c'
('o' if optimization is enabled in the current interpreter). If dfile is specified, it is used as the name of the source file in error messages instead of file. If doraise
is true, a PyCompileError
is raised when an error is encountered while compiling file. If doraise
is false (the default), an error string is written to sys.stderr
, but no exception is raised.
Check to make sure the compiled module is not imported (in current interpreter):
>>> import py_compile, sys
>>> py_compile.compile('/path/to/main.py')
>>> print [key for key in locals().keys() if isinstance(locals()[key], type(sys)) and not key.startswith('__')]
['py_compile', 'sys'] # main not present
You can't really do what you want efficiently. In order to see if a package is "valid", you need to run it -- not just check if it exists -- because it could have errors or unmet dependencies.
Using the pycompile
and compileall
will only test if you can compile a python file, not import a module. There is a big difference between the two.
- That approach means you know the actual file-structure of the modules --
import foo
could represent /foo.py
or /foo/__init__.py
.
- That approach doesn't guarantee the module is in your interpreter's pythonpath or is the module your interpreter would load. Things will get tricky if you have multiple versions in
/site-packages/
or python is looking in one of the many possible places for a module.
- Just because your file "compiles" doesn't mean it will "run". As a package it could have unmet dependences or even raise errors.
Imagine this is your python file:
from makebelieve import nothing
raise ValueError("ABORT")
The above will compile, but if you import them... it will raise an ImportError if you don't have makebelieve
installed and will raise a ValueError if you do.
My suggestions are:
import the package then unload the modules. to unload them, just iterate over stuff in sys.modules.keys()
. if you're worried about external modules that are loaded, you could override import
to log what your packages load. An example of this is in a terrible profiling package i wrote: https://github.com/jvanasco/import_logger [I forgot where I got the idea to override import from. Maybe celery
?] As some noted, unloading modules is entirely dependent on the interpreter -- but pretty much every option you have has many drawbacks.
Use subprocesses to spin up a new interpreter via popen
. ie popen('python', '-m', 'module_name')
. This would have a lot of overhead if you do this to every needed module (an overhead of each interpreter and import), but you could write a ".py" file that imports everything you need and just try to run that. In either case, you would have to analyze the output -- as importing a "valid" package could cause acceptable errors during execution. i can't recall if the subprocess inherits your environment vars or not , but I believe it does. The subprocess is an entirely new operating system process/interpreter, so the modules will be loaded into that short-lived processes' memory.clarified answer.
I believe imp.find_module
satisfies at least some of your requirements: https://docs.python.org/2/library/imp.html#imp.find_module
A quick test shows that it does not trigger an import:
>>> import imp
>>> import sys
>>> len(sys.modules)
47
>>> imp.find_module('email')
(None, 'C:\\Python27\\lib\\email', ('', '', 5))
>>> len(sys.modules)
47
>>> import email
>>> len(sys.modules)
70
Here's an example usage in some of my code (which attempts to classify modules): https://github.com/asottile/aspy.refactor_imports/blob/2b9bf8bd2cf22ef114bcc2eb3e157b99825204e0/aspy/refactor_imports/classify.py#L38-L44
We already had a custom importer (disclaimer: I did not write that code I 'm just the current maintainer) whose load_module
:
def load_module(self,fullname):
if fullname in sys.modules:
return sys.modules[fullname]
else: # set to avoid reimporting recursively
sys.modules[fullname] = imp.new_module(fullname)
if isinstance(fullname,unicode):
filename = fullname.replace(u'.',u'\\')
ext = u'.py'
initfile = u'__init__'
else:
filename = fullname.replace('.','\\')
ext = '.py'
initfile = '__init__'
try:
if os.path.exists(filename+ext):
with open(filename+ext,'U') as fp:
mod = imp.load_source(fullname,filename+ext,fp)
sys.modules[fullname] = mod
mod.__loader__ = self
else:
mod = sys.modules[fullname]
mod.__loader__ = self
mod.__file__ = os.path.join(os.getcwd(),filename)
mod.__path__ = [filename]
#init file
initfile = os.path.join(filename,initfile+ext)
if os.path.exists(initfile):
with open(initfile,'U') as fp:
code = fp.read()
exec compile(code, initfile, 'exec') in mod.__dict__
return mod
except Exception as e: # wrap in ImportError a la python2 - will keep
# the original traceback even if import errors nest
print 'fail', filename+ext
raise ImportError, u'caused by ' + repr(e), sys.exc_info()[2]
So I thought I could replace the parts that access the sys.modules
cache with overriddable methods that would in my override leave that cache alone:
So:
@@ -48,2 +55,2 @@ class UnicodeImporter(object):
- if fullname in sys.modules:
- return sys.modules[fullname]
+ if self._check_imported(fullname):
+ return self._get_imported(fullname)
@@ -51 +58 @@ class UnicodeImporter(object):
- sys.modules[fullname] = imp.new_module(fullname)
+ self._add_to_imported(fullname, imp.new_module(fullname))
@@ -64 +71 @@ class UnicodeImporter(object):
- sys.modules[fullname] = mod
+ self._add_to_imported(fullname, mod)
@@ -67 +74 @@ class UnicodeImporter(object):
- mod = sys.modules[fullname]
+ mod = self._get_imported(fullname)
and define:
class FakeUnicodeImporter(UnicodeImporter):
_modules_to_discard = {}
def _check_imported(self, fullname):
return fullname in sys.modules or fullname in self._modules_to_discard
def _get_imported(self, fullname):
try:
return sys.modules[fullname]
except KeyError:
return self._modules_to_discard[fullname]
def _add_to_imported(self, fullname, mod):
self._modules_to_discard[fullname] = mod
@classmethod
def cleanup(cls):
cls._modules_to_discard.clear()
Then I added the importer in the sys.meta_path and was good to go:
importer = sys.meta_path[0]
try:
if not hasattr(sys,'frozen'):
sys.meta_path = [fake_importer()]
perform_the_imports() # see question
finally:
fake_importer.cleanup()
sys.meta_path = [importer]
Right ? Wrong!
Traceback (most recent call last):
File "bash\bush.py", line 74, in __supportedGames
module = __import__('game',globals(),locals(),[modname],-1)
File "Wrye Bash Launcher.pyw", line 83, in load_module
exec compile(code, initfile, 'exec') in mod.__dict__
File "bash\game\game1\__init__.py", line 29, in <module>
from .constants import *
ImportError: caused by SystemError("Parent module 'bash.game.game1' not loaded, cannot perform relative import",)
Huh ? I am currently importing that very same module. Well the answer is probably in import's docs
If the module is not found in the cache, then sys.meta_path is searched (the specification for sys.meta_path can be found in PEP 302).
That's not completely to the point but what I guess is that the statement from .constants import *
looks up the sys.modules to check if the parent module is there, and I see no way of bypassing that (note that our custom loader is using the builtin import mechanism for modules, mod.__loader__ = self
is set after the fact).
So I updated my FakeImporter to use the sys.modules cache and then clean that up.
class FakeUnicodeImporter(UnicodeImporter):
_modules_to_discard = set()
def _check_imported(self, fullname):
return fullname in sys.modules or fullname in self._modules_to_discard
def _add_to_imported(self, fullname, mod):
super(FakeUnicodeImporter, self)._add_to_imported(fullname, mod)
self._modules_to_discard.add(fullname)
@classmethod
def cleanup(cls):
for m in cls._modules_to_discard: del sys.modules[m]
This however blew in a new way - or rather two ways:
a reference to the game/ package was held in bash
top package instance in sys.modules:
bash\
__init__.py
the_code_in_question_is_here.py
game\
...
because game
is imported as bash.game
. That reference held references to all game1, game2,...
, subpackages so those were never garbage collected
- a reference to another module (brec) was held as
bash.brec
by the same bash
module instance. This reference was imported as from .. import brec
in game\game1 without triggering an import, to update SomeClass
. However, in yet another module, an import of the form from ...brec import SomeClass
did trigger an import and another instance of the brec module ended up in the sys.modules. That instance had a non updated SomeClass
and blew with an AttributeError.
Both were fixed by manually deleting those references - so gc collected all modules (for 5 mbytes of ram out of 75) and the from .. import brec
did trigger an import (this from ... import foo
vs from ...foo import bar
warrants a question).
The moral of the story is that it is possible but:
- the package and subpackages should only reference each other
- all references to external modules/packages should be deleted from top level package attributes
- the package reference itself should be deleted from top level package attribute
If this sounds complicated and error prone it is - at least now I have a much cleaner view of interdependencies and their perils - time to address that.
This post was sponsored by Pydev's debugger - I found the gc
module very useful in grokking what was going on - tips from here. Of course there were a lot of variables that were the debugger's and that complicated stuff