How to concatenate multiple Python source files in

2020-02-29 11:59发布

问题:

(Assume that: application start-up time is absolutely critical; my application is started a lot; my application runs in an environment in which importing is slower than usual; many files need to be imported; and compilation to .pyc files is not available.)

I would like to concatenate all the Python source files that define a collection of modules into a single new Python source file.

I would like the result of importing the new file to be as if I imported one of the original files (which would then import some more of the original files, and so on).

Is this possible?

Here is a rough, manual simulation of what a tool might produce when fed the source files for modules 'bar' and 'baz'. You would run such a tool prior to deploying the code.

__file__ = 'foo.py'

def _module(_name):
    import types
    mod = types.ModuleType(name)
    mod.__file__ = __file__
    sys.modules[module_name] = mod
    return mod

def _bar_module():

    def hello():
        print 'Hello World! BAR'

    mod = create_module('foo.bar')
    mod.hello = hello
    return mod

bar = _bar_module()
del _bar_module

def _baz_module():

    def hello():
        print 'Hello World! BAZ'

    mod = create_module('foo.bar.baz')
    mod.hello = hello
    return mod

baz = _baz_module()
del _baz_module

And now you can:

from foo.bar import hello
hello()

This code doesn't take account of things like import statements and dependencies. Is there any existing code that will assemble source files using this, or some other technique?

This is very similar idea to tools being used to assemble and optimise JavaScript files before sending to the browser, where the latency of multiple HTTP requests hurts performance. In this Python case, it's the latency of importing hundreds of Python source files at startup which hurts.

回答1:

If this is on google app engine as the tags indicate, make sure you are using this idiom

def main(): 
    #do stuff
if __name__ == '__main__':
    main()

Because GAE doesn't restart your app every request unless the .py has changed, it just runs main() again.

This trick lets you write CGI style apps without the startup performance hit

AppCaching

If a handler script provides a main() routine, the runtime environment also caches the script. Otherwise, the handler script is loaded for every request.



回答2:

I think that due to the precompilation of Python files and some system caching, the speed up that you'll eventually get won't be measurable.



回答3:

Doing this is unlikely to yield any performance benefits. You're still importing the same amount of Python code, just in fewer modules - and you're sacrificing all modularity for it.

A better approach would be to modify your code and/or libraries to only import things when needed, so that a minimum of required code is loaded for each request.



回答4:

Without dealing with the question, whether or not this technique would boost up things at your environment, say you are right, here is what I would have done.

I would make a list of all my modules e.g. my_files = ['foo', 'bar', 'baz']

I would then use os.path utilities to read all lines in all files under the source directory and writes them all into a new file, filtering all import foo|bar|baz lines since all code is now within a single file.

Of curse, at last adding the main() from __init__.py (if there is such) at the tail of the file.