What does “import” prefer - .pyd (.so) or .py?

2020-03-18 07:38发布

问题:

I have 2 files in same directory, a compiled library file and source file:

.
├── a.py
└── a.pyd

It looks like import a that actually imports the a.pyd module. But I can't find some official document guaranteeing that.

Does any one know about the import ordering of different file type?

This same question applies to Unix Python extensions (.so)

回答1:

In a typical Python installation, the ExtensionFileLoader class has precedence over the SourceFileLoader that is used for .py files. It's the ExtensionFileLoader which handles imports of .pyd files, and on a Windows machine you will find .pyd registered in importlib.machinery.EXTENSION_SUFFIXES (note: on Linux/macOS it will have .so in there instead).

So in the case of name collision within same directory (which means a "tie" when looking through sys.path in order), the a.pyd file takes precedence over the a.py file. You may verify that when creating empty a.pyd and a.py files, the statement import a attempts the DLL load (and fails, of course).

To see the precedence in the CPython sources, look here in importlib._bootstrap_external. _get_supported_file_loaders:

def _get_supported_file_loaders():
    """Returns a list of file-based module loaders.
    Each item is a tuple (loader, suffixes).
    """
    extensions = ExtensionFileLoader, _imp.extension_suffixes()
    source = SourceFileLoader, SOURCE_SUFFIXES
    bytecode = SourcelessFileLoader, BYTECODE_SUFFIXES
    return [extensions, source, bytecode]  # <-- extensions before source!

For a doc reference, see http://www.python.org/doc/essays/packages/

What If I Have a Module and a Package With The Same Name?

You may have a directory (on sys.path) which has both a module spam.py and a subdirectory spam that contains an __init__.py (without the __init__.py, a directory is not recognized as a package). In this case, the subdirectory has precedence, and importing spam will ignore the spam.py file, loading the package spam instead. If you want the module spam.py to have precedence, it must be placed in a directory that comes earlier in sys.path.

(Tip: the search order is determined by the list of suffixes returned by the function imp.get_suffixes(). Usually the suffixes are searched in the following order: ".so", "module.so", ".py", ".pyc". Directories don't explicitly occur in this list, but precede all entries in it.)

This doc doesn't explicitly mention ".pyd", but that's the Windows equivalent of ".so". I've just tested on a Windows machine, and indeed '.pyd' appears before '.py' in the suffix list.

Note that the reference given above is very old! Since this essay was written, the import system has been completely revamped, and the underlying machinery exposed for users (you can mutate the sys.meta_path to register your own loaders or change precedence, for example). So it would be possible now to customize for '.py' to be preferred to '.pyd', and it doesn't matter much what imp.get_suffixes() has to say about anything (actually, that function is deprecated now). A default Python installation would not do that, of course, and the default precedence remains the same as the reference above has mentioned.



回答2:

Thanks for wim's answer.

import importlib
print(importlib.util.find_spec('a'))

show the result

ModuleSpec(name='a', loader=<_frozen_importlib_external.ExtensionFileLoader object at 0x02A79EF0>, origin='a.pyd')

Although I cant see the order of pyd,py.

At least I can distinguish which one that I import to modular.