Python shared object module naming convention

2019-01-14 21:03发布

问题:

I have written a Python module in C++ and built it as a shared object library and it worked fine. But while figuring all that out, I noticed (via strace) that Python looks for a few different variations import is called. In particular, when I say import foo, Python searches for, in order:

  • foo (a directory)
  • foo.so
  • foomodule.so
  • foo.py
  • foo.pyc

This was all pretty understandable except for foomodule.so. Why does Python look for everything both as name.so and namemodule.so? Is it some historical artifact? I searched quite a bit and came up with no explanation at all, and am left wondering if I should name my module foomodule.so instead of foo.so. My system seems to have some existing Python modules following each convention, so I can't help but wonder if the different names imply something.

回答1:

This is actually platform-dependent, Python has different suffixes that it tries depending on the operating system. Here is the initialization of the suffix table in import.c:

#ifdef HAVE_DYNAMIC_LOADING
    memcpy(filetab, _PyImport_DynLoadFiletab,
           countD * sizeof(struct filedescr));
#endif
    memcpy(filetab + countD, _PyImport_StandardFiletab,
           countS * sizeof(struct filedescr));
    filetab[countD + countS].suffix = NULL;

    _PyImport_Filetab = filetab;

So it joins two lists, _PyImport_DynLoadFiletab and _PyImport_StandardFiletab. The latter is the easier one, it is defined as [".py", ".pyw", ".pyc"] in the same file (second entry is only present on Windows). _PyImport_DynLoadFiletab is defined in various dynload_<platform>.c files. On Unix-based systems its value is [".so", "module.so"], for CygWin it defines [".dll", "module.dll"] whereas for OS/2 it is [".pyd", ".dll"] and for Windows it is simply [".pyd"].

I went through the source code history and finally arrived at this change from 1999 that apparently added "module.so" as a possible suffix: http://hg.python.org/cpython-fullhistory/diff/8efa37a770c6/Python/importdl.c. So the changes were originally added for NeXTStep (the one that eventually became Mac OS X), for particular linking settings only. I don't know this OS so it is hard to tell why it was done - I suspect that it was simply to prevent naming conflicts. E.g. a framework library foo.so might be loaded already and the OS won't allow loading another library with the same name. So foomodule.so was a compromise to allow a Python module with the name foo to exist nevertheless.

Edit: The paragraph above was wrong - I didn't go far enough back in history, thanks to senderle for pointing that out. In fact, the interesting change appears to be http://hg.python.org/cpython-fullhistory/diff/2230/Python/import.c from 1994 which is where a new module naming scheme (foo.so) was added as an alternative to the old scheme (foomodule.so). I guess that the old form became deprecated at some point given that support for it has been removed for some platforms like Windows in one of the numerous rewrites of that code. Note that even when it was first introduced the short module name version was listed first - meaning that it already was the preferred variant.

Edit2: I searched the mailing list/newsgroup from 1994 to see whether this change was discussed somewhere - it doesn't look like it was, Guido van Rossum seems to have implemented it without telling anyone.



回答2:

This is merely a guess, but I can only assume this is related to the below, from Extending Python with C or C++.

Begin by creating a file spammodule.c. (Historically, if a module is called spam, the C file containing its implementation is called spammodule.c; if the module name is very long, like spammify, the module name can be just spammify.c.)

I suppose this convention extends to the name of the .so file. That conjecture is further supported by section 1.5 of the same.


Based on Wladimir's excellent discovery, I've found the first reference to module.so as a suffix. It's from a patch to support dynamic loading of SunOS libraries, from "Bill." (Bill Jansson?) Clearly the module-as-suffix convention began before the use of .so shared libraries, and when .so libraries were adopted, the convention was simply maintained.

I think Wladimir is right though -- the interesting change is the one in which the short module name convention was adopted. That confirms my guess that the long module name was the earlier convention.