Library expects symbol in flat namespace although

2019-07-20 06:41发布

问题:

I load Python dynamically with dlopen and RTLD_LOCAL to avoid collisions with another library which by coincidence contains a few symbols with the same name. Executing my MVCE above on macOS with Xcode fails because it expects _PyBuffer_Type in the global namespace.

Traceback (most recent call last):
  File "...lib/python2.7/ctypes/__init__.py", line 10, in <module>
    from _ctypes import Union, Structure, Array
ImportError: dlopen(...lib/python2.7/lib-dynload/_ctypes.so, 2):
    Symbol not found: _PyBuffer_Type
  Referenced from: ...lib/python2.7/lib-dynload/_ctypes.so
  Expected in: flat namespace
 in ...lib/python2.7/lib-dynload/_ctypes.so
Program ended with exit code: 255

But why? Does RTLD_LOCAL overwrite the two-level namespace?

I used otool -hV to check that _ctypes.so was compiled with the Two-Level namespace option. From my understanding the symbol resolve needs the library name + the symbol name itself. Why does it expect _PyBuffer_Type in the flat namespace and/or why can't it find it? See TWOLEVEL by scrolling to the right

> otool -hV /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_ctypes.so
Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
MH_MAGIC_64  X86_64        ALL  0x00      BUNDLE    14       1536   NOUNDEFS DYLDLINK TWOLEVEL

Any idea whats going on here?

MVCE

Can be copied to a new Xcode project, simply compile and execute.

#include </System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7/Python.h>
#include <dlfcn.h>

int main(int argc, const char * argv[])
{
    auto* dl = dlopen("/System/Library/Frameworks/Python.framework/Versions/2.7/Python", RTLD_LOCAL | RTLD_NOW);
    if (dl == nullptr)
        return 0;

    // Load is just a macro to hide dlsym(..)
    #define Load(name)  ((decltype(::name)*)dlsym(dl, # name))

    Load(Py_SetPythonHome)("/System/Library/Frameworks/Python.framework/Versions/2.7");
    Load(Py_Initialize)();

    auto* readline = Load(PyImport_ImportModule)("ctypes");
    if (readline == nullptr)
    {
        Load(PyErr_Print)();
        dlclose(dl);
        return -1;
    }

    Py_DECREF(readline);
    Load(Py_Finalize)();
    return 0;
}

回答1:

This question and your related RTLD_GLOBAL question both concern the semantics of the dynamic loader resolving undefined symbols in the shared libraries that it loads. I was hoping to find an explicit documentation reference that would explain what you're seeing, but I've not been able to do it. Nonetheless, I can make an observation that may explain what's happening.

If we run with verbosity, we can see that the python library is attempting to load two shared libraries before it fails:

bash-3.2$ PYTHONVERBOSE=1 ./main 2>&1 | grep -i dlopen
dlopen(".../python2.7/lib-dynload/_locale.so", 2);
dlopen(".../python2.7/lib-dynload/_ctypes.so", 2);

Given that the first one succeeds, we know that generally the dynamic loader is resolving the undefined symbols against the namespace of the calling library. And in fact, as you note in the comments of your other question, this even works when there are two versions of the python library, i.e. the dlopen()s done by the python libraries resolve against their respective namespaces. So far, this sounds like exactly what you want. But, why is _ctypes.so failing to load?

We know that _PyModule_GetDict is the symbol that was causing _locale.so to fail to load in your other question; and that it obviously works here. We also know that the symbol _PyBuffer_Type is failing here. What's the difference between these two symbols? Looking them up in the python library:

bash-3.2$ nm libpython2.7.dylib | grep _PyModule_GetDict
00000000000502c0 T _PyModule_GetDict
bash-3.2$ nm libpython2.7.dylib | grep _PyBuffer_Type
0000000000154f90 D _PyBuffer_Type

_PyModule_GetDict is a Text (code) symbol, whereas _PyBuffer_Type is a Data symbol.

Therefore, based on this empirical data, I suspect the dynamic loader will resolve undefined symbols against RTLD_LOCAL code symbols of the calling library, but not RTLD_LOCAL data symbols. Perhaps somebody can point to an explicit reference.