How to organize multiple python files into a singl

2019-02-04 02:06发布

问题:

Is there a way to use __init__.py to organize multiple files into a module?

Reason: Modules are easier to use than packages, because they don't have as many layers of namespace.

Normally it makes a package, this I get. Problem is with a package, 'import thepackage' gives me an empty namespace. Users must then either use "from thepackage import *" (frowned upon) or know exactly what is contained and manually pull it out into a usable namespace.

What I want to have is the user do 'import thepackage' and have nice clean namespaces that look like this, exposing functions and classes relevant to the project for use.

current_module
\
  doit_tools/
  \
   - (class) _hidden_resource_pool
   - (class) JobInfo
   - (class) CachedLookup
   - (class) ThreadedWorker
   - (Fn) util_a
   - (Fn) util_b
   - (Fn) gather_stuff
   - (Fn) analyze_stuff

The maintainer's job would be to avoid defining the same name in different files, which should be easy when the project is small like mine is.

It would also be nice if people can do from doit_stuff import JobInfo and have it retrieve the class, rather than a module containing the class.

This is easy if all my code is in one gigantic file, but I like to organize when things start getting big. What I have on disk looks sort of like this:

place_in_my_python_path/
  doit_tools/
    __init__.py
    JobInfo.py
      - class JobInfo:
    NetworkAccessors.py
      - class _hidden_resource_pool:
      - class CachedLookup:
      - class ThreadedWorker:
    utility_functions.py
      - def util_a()
      - def util_b()
    data_functions.py
      - def gather_stuff()
      - def analyze_stuff()

I only separate them so my files aren't huge and unnavigable. They are all related, though someone (possible me) may want to use the classes by themselves without importing everything.

I've read a number of suggestions in various threads, here's what happens for each suggestion I can find for how to do this:

If I do not use an __init__.py, I cannot import anything because Python doesn't descend into the folder from sys.path.

If I use a blank __init__.py, when I import doit_tools it's an empty namespace with nothing in it. None of my files imported, which makes it more difficult to use.

If I list the submodules in __all__, I can use the (frowned upon?) from thing import * syntax, but all of my classes are behind unnecessary namespace barriers again. The user has to (1) know they should use from x import * instead of import x, (2) manually reshuffle classes until they can reasonably obey line width style constraints.

If I add from thatfile import X statements to __init__.py, I get closer but I have namespace conflicts (?) and extra namespaces for things I didn't want to be in there. In the below example, you'll see that:

  1. The class JobInfo overwrote the module object named JobInfo because their names were the same. Somehow Python can figure this out, because JobInfo is of type <class 'doit_tools.JobInfo.JobInfo'>. (doit_tools.JobInfo is a class, but doit_tools.JobInfo.JobInfo is that same class... this is tangled and seems very bad, but doesn't seem to break anything.)
  2. Each filename made its way into the doit_tools namespace, which makes it more confusing to look through if anyone is looking at the contents of the module. I want doit_tools.utility_functions.py to hold some code, not define a new namespace.

.

current_module
\
  doit_tools/
  \
   - (module) JobInfo
      \
       - (class) JobInfo
   - (class) JobInfo
   - (module) NetworkAccessors
      \
       - (class) CachedLookup
       - (class) ThreadedWorker
   - (class) CachedLookup
   - (class) ThreadedWorker
   - (module) utility_functions
      \
       - (Fn) util_a
       - (Fn) util_b
   - (Fn) util_a
   - (Fn) util_b
   - (module) data_functions
      \
       - (Fn) gather_stuff
       - (Fn) analyze_stuff
   - (Fn) gather_stuff
   - (Fn) analyze_stuff

Also someone importing just the data abstraction class would get something different than they expect when they do 'from doit_tools import JobInfo':

current_namespace
\
 JobInfo (module)
  \
   -JobInfo (class)

instead of:

current_namespace
\
 - JobInfo (class)

So, is this just a wrong way to organize Python code? If not, what is a correct way to split related code up but still collect it in a module-like way?

Maybe the best case scenario is that doing 'from doit_tools import JobInfo' is a little confusing for someone using the package?

Maybe a python file called 'api' so that people using the code do the following?:

import doit_tools.api
from doit_tools.api import JobInfo

============================================

Examples in response to comments:

Take the following package contents, inside folder 'foo' which is in python path.

foo/__init__.py

__all__ = ['doit','dataholder','getSomeStuff','hold_more_data','SpecialCase']
from another_class import doit
from another_class import dataholder
from descriptive_name import getSomeStuff
from descriptive_name import hold_more_data
from specialcase import SpecialCase

foo/specialcase.py

class SpecialCase:
    pass

foo/more.py

def getSomeStuff():
    pass

class hold_more_data(object):
    pass

foo/stuff.py

def doit():
    print "I'm a function."

class dataholder(object):
    pass

Do this:

>>> import foo
>>> for thing in dir(foo): print thing
... 
SpecialCase
__builtins__
__doc__
__file__
__name__
__package__
__path__
another_class
dataholder
descriptive_name
doit
getSomeStuff
hold_more_data
specialcase

another_class and descriptive_name are there cluttering things up, and also have extra copies of e.g. doit() underneath their namespaces.

If I have a class named Data inside a file named Data.py, when I do 'from Data import Data' then I get a namespace conflict because Data is a class in the current namespace that is inside module Data, somehow is also in the current namespace. (But Python seems to be able to handle this.)

回答1:

You can sort of do it, but it's not really a good idea and you're fighting against the way Python modules/packages are supposed to work. By importing appropriate names in __init__.py you can make them accessible in the package namespace. By deleting module names you can make them inaccessible. (For why you need to delete them, see this question). So you can get close to what you want with something like this (in __init__.py):

from another_class import doit
from another_class import dataholder
from descriptive_name import getSomeStuff
from descriptive_name import hold_more_data
del another_class, descriptive_name
__all__ = ['doit', 'dataholder', 'getSomeStuff', 'hold_more_data']

However, this will break subsequent attempts to import package.another_class. In general, you can't import anything from a package.module without making package.module accessible as an importable reference to that module (although with the __all__ you can block from package import module).

More generally, by splitting up your code by class/function you are working against the Python package/module system. A Python module should generally contain stuff you want to import as a unit. It's not uncommon to import submodule components directly in the top-level package namespace for convenience, but the reverse --- trying to hide the submodules and allow access to their contents only through the top-level package namespace --- is going to lead to problems. In addition, there is nothing to be gained by trying to "cleanse" the package namespace of the modules. Those modules are supposed to be in the package namespace; that's where they belong.



回答2:

Define __all__ = ['names', 'that', 'are', 'public'] in the __init__.py e.g.:

__all__ = ['foo']

from ._subpackage import foo

Real-world example: numpy/__init__.py.


You have some misconception about how Python packages work:

If I do not use an __init__.py, I cannot import anything because Python doesn't descend into the folder from sys.path.

You need __init__.py file in Python versions older than Python 3.3 to mark a directory as containing a Python package.

If I use a blank __init__.py, when I import doit_tools it's an empty namespace with nothing in it. None of my files imported, which makes it more difficult to use.

It doesn't prevent the import:

from doit_tools import your_module

It works as expected.

If I list the submodules in __all__, I can use the (frowned upon?) from thing import * syntax, but all of my classes are behind unnecessary namespace barriers again. The user has to (1) know they should use from x import * instead of import x, (2) manually reshuffle classes until they can reasonably obey line width style constraints.

(1) Your users (in most cases) should not use from your_package import * outside an interactive Python shell.

(2) you could use () to break a long import line:

from package import (function1, Class1, Class2, ..snip many other names..,
                     ClassN)

If I add from thatfile import X statements to __init__.py, I get closer but I have namespace conflicts (?) and extra namespaces for things I didn't want to be in there.

It is upto you to resolve namespace conflicts (different objects with the same name). The name can refer to any object: integer, string, package, module, class, functions, etc. Python can't know what object you might prefer and even if it could it would be inconsistent to ignore some name bindings in this particular case with respect to the usage of name bindings in all other cases.

To mark names as non-public you could prefix them with _ e.g., package/_nonpublic_module.py.



回答3:

There are perfectly valid reasons to hide the sub-structure of a package (not only when debugging). Amongst them are convenience and efficiency. When trying to do a rapid prototype with a package it is extremely annoying having to interrupt the train of thought just to look up the utterly useless information what the exact sub-module for a specific function or class might be.

When everything is available at the top level of a package, The idiom:

python -c 'import pkg; help(pkg)'

displays the entire help, not just some measly module names.

You can always turn off sub-module imports for production code, or to clean up the package modules after development.

The following is the best way I have come up with so far. It maximizes convenience while trying not to suppress valid errors. See also the full source with doctest documentation.


Define package name and sub-modules to be imported to avoid error-prone duplication:

_package_ = 'flat_export'
_modules_ = ['sub1', 'sub2', 'sub3']

Use relative imports when available (this is imperative, see is_importing_package):

_loaded = False
if is_importing_package(_package_, locals()):
    for _module in _modules_:
        exec ('from .' + _module + ' import *')
    _loaded = True
    del(_module)

Try importing the package, including __all__.
This happens when executing a module file as script with the package in the search path (e.g. python flat_export/__init__.py)

if not _loaded:
    try:
        exec('from ' + _package_ + ' import *')
        exec('from ' + _package_ + ' import __all__')
        _loaded = True
    except (ImportError):
        pass

As a last resort, try importing the sub-modules directly.
This happens when executing a module file as script inside the package directory without the package in the search path (e.g. cd flat_export; python __init__.py).

if not _loaded:
    for _module in _modules_:
        exec('from ' + _module + ' import *')
    del(_module)

Construct __all__ (leaving out modules), unless it has been imported before:

if not __all__:
    _module_type = type(__import__('sys'))
    for _sym, _val in sorted(locals().items()):
        if not _sym.startswith('_') and not isinstance(_val, _module_type) :
            __all__.append(_sym)
    del(_sym)
    del(_val)
    del(_module_type)

Here is the function is_importing_package:

def is_importing_package(_package_, locals_, dummy_name=None):
    """:returns: True, if relative package imports are working.

    :param _package_: the package name (unfortunately, __package__
      does not work, since it is None, when loading ``:(``).
    :param locals_: module local variables for auto-removing function
      after use.
    :param dummy_name: dummy module name (default: 'dummy').

    Tries to do a relative import from an empty module `.dummy`. This
    avoids any secondary errors, other than::

        ValueError: Attempted relative import in non-package
    """

    success = False
    if _package_:
        import sys
        dummy_name = dummy_name or 'dummy'
        dummy_module = _package_ + '.' + dummy_name
        if not dummy_module in sys.modules:
            import imp
            sys.modules[dummy_module] = imp.new_module(dummy_module)
        try:
            exec('from .' + dummy_name + ' import *')
            success = True
        except:
            pass
    if not 'sphinx.ext.autodoc' in __import__('sys').modules:
        del(locals_['is_importing_package'])
    return success


回答4:

python is not java. Module file name does not need to be the same as class name. In fact python recommend using all lower case for module file name.

Also "from math import sqrt" will only add sqrt to namespace, not math.