-->

How can I access sibling packages in a maintainabl

2019-08-20 15:35发布

问题:

I often end up in a situation where one package needs to use a sibling package. I want to clarify that I'm not asking about how Python allows you to import sibling packages, which has been asked many times. Instead, my question is about a best practice for writing maintainable code.

  1. Let's say we have a tools package, and the function tools.parse_name() depends on tools.split_name(). Initially, both might live in the same file where everything is easy:

    # tools/__init__.py
    from .name import parse_name, split_name
    
    # tools/name.py
    def parse_name(name):
      splits = split_name(name)  # Can access from same file.
      return do_something_with_splits(splits)
    
    def split_name(name):
      return do_something_with_name(name)
    
  2. Now, at some point we decide that the functions have grown and split them into two files:

     # tools/__init__.py
    from .parse_name import parse_name
    from .split_name import split_name
    
    # tools/parse_name.py
    import tools
    
    def parse_name(name):
      splits = tools.split_name(name)   # Won't work because of import order!
      return do_something_with_splits(splits)
    
    # tools/split_name.py
    def split_name(name):
      return do_something_with_name(name)
    

    The problem is that parse_name.py can't just import the tools package which it is part of itself. At least, this won't allow it to use tools listed below its own line in tools/__init__.py.

  3. The technical solution is to import tools.split_name rather than tools:

    # tools/__init__.py
    from .parse_name import parse_name
    from .split_name import split_name
    
    # tools/parse_name.py
    import tools.split_name as tools_split_name
    
    def parse_name(name):
      splits = tools_split_name.split_name(name)   # Works but ugly!
      return do_something_with_splits(splits)
    
    # tools/split_name.py
    def split_name(name):
      return do_something_with_name(name)
    

This solution technically works but quickly becomes messy if more than just one sibling packages are used. Moreover, renaming the package tools to utilities would be a nightmare, since now all the module aliases should change as well.

It would like to avoid importing functions directly and instead import packages, so that it is clear where a function came from when reading the code. How can I handle this situation in a readable and maintainable way?

回答1:

I can literally ask you what syntax do you need and provide it. I won't, but you can do it yourself too.

"The problem is that parse_name.py can't just import the tools package which is part of itself."

That looks like a wrong and strange thing to do, indeed.

"At least, this won't allow it to use tools listed below its own line in tools/__init__.py"

Agreed, but again, we don't need that, if things are structured properly.

To simplify the discussion and reduce the degrees of freedom,I assumed several things in the example below.

You can then adapt to different but similar scenarios, because you can modify the code to fit your import syntax requirements.

I give some hints for changes in the end.

Scenario:

You want to build an import package named tools.

You have a lot of functions in there, that you want to make available to client code in client.py. This file uses the package tools by importing it. To keep simplicity I make all the functions (from everywhere) available below tools namespace, by using a from ... import * form. That is dangerous and should be modified in real scenario to prevent name clashes with and between subpackage names.

You organize the functions together by grouping them in import packages inside your tools package (subpackages).

The subpackages have (by definition) their own folder and at least an __init__.py inside. I choose to put the subpackages code in a single module in every subpackage folder, besides the __init__.py. You can have more modules and/or inner packages.

.
├── client.py
└── tools
    ├── __init__.py
    ├── splitter
    │   ├── __init__.py
    │   └── splitter.py
    └── formatter
        ├── __init__.py
        └── formatter.py

I keep the __init__.pys empty, except for the outside one, which is responsible to make all the wanted names available to client importing code, in the tools namespace. This can be changed of course.

#/tools/__init.py___
# note that relative imports avoid using the outer package name
# which is good if later you change your mind for its name

from .splitter.splitter import * 
from .formatter.formatter import * 



# tools/client.py
# this is user code

import tools

text = "foo bar"

splits = tools.split(text) # the two funcs came 
                           # from different subpackages
text = tools.titlefy(text)

print(splits)
print(text)




# tools/formatter/formatter.py
from ..splitter import splitter # tools formatter sibling
                                # subpackage splitter,
                                # module splitter

def titlefy(name):
  splits = splitter.split(name)
  return ' '.join([s.title() for s in splits])




# tools/splitter/splitter.py
def split(name):
    return name.split()

You can actually tailor the imports syntax to your taste, to answer your comment about what they look like.

from form is needed for relative imports. Otherwise use absolute imports by prefixing the path with tools.

__init__.pys can be used to adjust the imported names into the importer code, or to initialize the module. They can also be empty, or actually start as the only file in the subpackage, with all the code in it, and then be splitted in other modules, despite I don't like this "everything in __init__.py" approach as much.

They are just code that runs on import.

You can also avoid repeated names in imported paths by either using different names, or by putting everything in __init__.py, dropping the module with the repeated name, or by using aliases in the __init__.py imports, or with name attributions there. You may also limit what gets exported when the * form is used by importer by assigning names to an __all__ list.

A change you might want for safer readability is to force client.py in specifying the subpackage when using names that is,

name1 = tools.splitter.split('foo bar')

Change the __init__.py to import only the submodules, like this:

from .splitter import splitter
from .formatter import formatter


回答2:

I'm not proposing this to be actually used in practice, but just for fun, here is a solution using pkgutil and inspect:

import inspect
import os
import pkgutil


def import_siblings(filepath):
  """Import and combine names from all sibling packages of a file."""
  path = os.path.dirname(os.path.abspath(filepath))
  merged = type('MergedModule', (object,), {})
  for importer, module, _ in pkgutil.iter_modules([path]):
    if module + '.py' == os.path.basename(filepath):
      continue
    sibling = importer.find_module(module).load_module(module)
    for name, member in inspect.getmembers(sibling):
      if name.startswith('__'):
        continue
      if hasattr(merged, name):
        message = "Two sibling packages define the same name '{}'."
        raise KeyError(message.format(name))
      setattr(merged, name, member)
  return merged

The example from the question becomes:

# tools/__init__.py
from .parse_name import parse_name
from .split_name import split_name

# tools/parse_name.py
tools = import_siblings(__file__)

def parse_name(name):
  splits = tools.split_name(name)  # Same usage as if this was an external module.
  return do_something_with_splits(splits)

# tools/split_name.py
def split_name(name):
  return do_something_with_name(name)