How to tidy/fix PyCXX's creation of new-style

2020-04-19 06:37发布

问题:

I've nearly finished rewriting a C++ Python wrapper (PyCXX).

The original allows old and new style extension classes, but also allows one to derive from the new-style classes:

import test

// ok
a = test.new_style_class();

// also ok
class Derived( test.new_style_class() ):
    def __init__( self ):
        test_funcmapper.new_style_class.__init__( self )

    def derived_func( self ):
        print( 'derived_func' )
        super().func_noargs()

    def func_noargs( self ):
        print( 'derived func_noargs' )

d = Derived()

The code is convoluted, and appears to contain errors (Why does PyCXX handle new-style classes in the way it does?)

My question is: What is the rationale/justification for PyCXX's convoluted mechanism? Is there a cleaner alternative?

I will attempt to detail below where I am at with this enquiry. First I will try and describe what PyCXX is doing at the moment, then I will describe what I think could maybe be improved.


When the Python runtime encounters d = Derived(), it does PyObject_Call( ob ) where ob is thePyTypeObjectforNewStyleClass. I will writeobasNewStyleClass_PyTypeObject`.

That PyTypeObject has been constructed in C++ and registered using PyType_Ready

PyObject_Call will invoke type_call(PyTypeObject *type, PyObject *args, PyObject *kwds), returning an initialised Derived instance i.e.

PyObject* derived_instance = type_call(NewStyleClass_PyTypeObject, NULL, NULL)

Something like this.

(All of this coming from (http://eli.thegreenplace.net/2012/04/16/python-object-creation-sequence by the way, thanks Eli!)

type_call does essentially:

type->tp_new(type, args, kwds);
type->tp_init(obj, args, kwds);

And our C++ wrapper has inserted functions into the tp_new and tp_init slots of NewStyleClass_PyTypeObject something like this:

typeobject.set_tp_new( extension_object_new );
typeobject.set_tp_init( extension_object_init );

:
    static PyObject* extension_object_new( PyTypeObject* subtype, 
                                              PyObject* args, PyObject* kwds )
    {
        PyObject* pyob = subtype->tp_alloc(subtype,0);

        Bridge* o = reinterpret_cast<Bridge *>( pyob );

        o->m_pycxx_object = nullptr;

        return pyob;
    }

    static int extension_object_init( PyObject* _self, 
                                            PyObject* args, PyObject* kwds )
    {
        Bridge* self{ reinterpret_cast<Bridge*>(_self) };

        // NOTE: observe this is where we invoke the constructor, 
        //       but indirectly (i.e. through final)
        self->m_pycxx_object = new FinalClass{ self, args, kwds };

        return 0;
    }

Note that we need to bind together the Python Derived instance, and it's corresponding C++ class instance. (Why? Explained below, see 'X'). To do that we are using:

struct Bridge
{
    PyObject_HEAD // <-- a PyObject
    ExtObjBase* m_pycxx_object;
}

Now this bridge raises a question. I'm very suspicious of this design.

Note how memory was allocated for this new PyObject:

        PyObject* pyob = subtype->tp_alloc(subtype,0);

And then we typecast this pointer to Bridge, and use the 4 or 8 (sizeof(void*)) bytes immediately following the PyObject to point to the corresponding C++ class instance (this gets hooked up in extension_object_init as can be seen above).

Now for this to work we require:

a) subtype->tp_alloc(subtype,0) must be allocating an extra sizeof(void*) bytes b) The PyObject doesn't require any memory beyond sizeof(PyObject_HEAD), because if it did then this would be conflicting with the above pointer

One major question I have at this point is: Can we guarantee that the PyObject that the Python runtime has created for our derived_instance does not overlap into Bridge's ExtObjBase* m_pycxx_object field?

I will attempt to answer it: it is US determining how much memory gets allocated. When we create NewStyleClass_PyTypeObject we feed in how much memory we want this PyTypeObject to allocate for a new instance of this type:

template< TEMPLATE_TYPENAME FinalClass >
class ExtObjBase : public FuncMapper<FinalClass> , public ExtObjBase_noTemplate
{
protected:
    static TypeObject& typeobject()
    {
        static TypeObject* t{ nullptr };
        if( ! t )
            t = new TypeObject{ sizeof(FinalClass), typeid(FinalClass).name() };
                   /*           ^^^^^^^^^^^^^^^^^ this is the bug BTW!
                        The C++ Derived class instance never gets deposited
                        In the memory allocated by the Python runtime
                        (controlled by this parameter)

                        This value should be sizeof(Bridge) -- as pointed out
                        in the answer to the question linked above

        return *t;
    }
:
}

class TypeObject
{
private:
    PyTypeObject* table;

    // these tables fit into the main table via pointers
    PySequenceMethods*       sequence_table;
    PyMappingMethods*        mapping_table;
    PyNumberMethods*         number_table;
    PyBufferProcs*           buffer_table;

public:
    PyTypeObject* type_object() const
    {
        return table;
    }

    // NOTE: if you define one sequence method you must define all of them except the assigns

    TypeObject( size_t size_bytes, const char* default_name )
        : table{ new PyTypeObject{} }  // {} sets to 0
        , sequence_table{}
        , mapping_table{}
        , number_table{}
        , buffer_table{}
    {
        PyObject* table_as_object = reinterpret_cast<PyObject* >( table );

        *table_as_object = PyObject{ _PyObject_EXTRA_INIT  1, NULL }; 
        // ^ py_object_initializer -- NULL because type must be init'd by user

        table_as_object->ob_type = _Type_Type();

        // QQQ table->ob_size = 0;
        table->tp_name              = const_cast<char *>( default_name );
        table->tp_basicsize         = size_bytes;
        table->tp_itemsize          = 0; // sizeof(void*); // so as to store extra pointer

        table->tp_dealloc           = ...

You can see it going in as table->tp_basicsize

But now it seems clear to me that PyObject-s generated from NewStyleClass_PyTypeObject will never require additional allocated memory.

Which means that this whole Bridge mechanism is unnecessary.

And PyCXX's original technique for using PyObject as a base class of NewStyleClassCXXClass, and initialising this base so that the Python runtime's PyObject for d = Derived() is in fact this base, this technique is looking good. Because it allows seamless typecasting.

Whenever Python runtime calls a slot from NewStyleClass_PyTypeObject, it will be passing a pointer to d's PyObject as the first parameter, and we can just typecast back to NewStyleClassCXXClass. <-- 'X' (referenced above)

So really my question is: why don't we just do this? Is there something special about deriving from NewStyleClass that forces extra allocation for the PyObject?

I realise I don't understand the creation sequence in the case of a derived class. Eli's post didn't cover that.

I suspect this may be connected with the fact that

    static PyObject* extension_object_new( PyTypeObject* subtype, ...

^ this variable name is 'subtype' I don't understand this, and I wonder if this may hold the key.

EDIT: I thought of one possible explanation for why PyCXX is using sizeof(FinalClass) for initialisation. It might be a relic from an idea that got tried and discarded. i.e. If Python's tp_new call allocates enough space for the FinalClass (which has the PyObject as base), maybe a new FinalClass can be generated on that exact location using 'placement new', or some cunning reinterpret_cast business. My guess is this might have been tried, found to pose some problem, worked around, and the relic got left behind.

回答1:

PyCXX is not convoluted. It does have two bugs, but they can be easily fixed without requiring significant changes to the code.

When creating a C++ wrapper for the Python API, one encounters a problem. The C++ object model and the Python new-style object model are very different. One fundamental difference is that C++ has a single constructor that both creates and initializes the object. While Python has two stages; tp_new creates the object and performs minimal intialization (or just returns an existing object) and tp_init performs the rest of the initialization.

PEP 253, which you should probably read in its entirety, says:

The difference in responsibilities between the tp_new() slot and the tp_init() slot lies in the invariants they ensure. The tp_new() slot should ensure only the most essential invariants, without which the C code that implements the objects would break. The tp_init() slot should be used for overridable user-specific initializations. Take for example the dictionary type. The implementation has an internal pointer to a hash table which should never be NULL. This invariant is taken care of by the tp_new() slot for dictionaries. The dictionary tp_init() slot, on the other hand, could be used to give the dictionary an initial set of keys and values based on the arguments passed in.

...

You may wonder why the tp_new() slot shouldn't call the tp_init() slot itself. The reason is that in certain circumstances (like support for persistent objects), it is important to be able to create an object of a particular type without initializing it any further than necessary. This may conveniently be done by calling the tp_new() slot without calling tp_init(). It is also possible hat tp_init() is not called, or called more than once -- its operation should be robust even in these anomalous cases.

The entire point of a C++ wrapper is to enable you to write nice C++ code. Say for example that you want your object to have a data member that can only be initialized during its construction. If you create the object during tp_new, then you cannot reinitialize that data member during tp_init. This will probably force you to hold that data member via some kind of a smart pointer and create it during tp_new. This makes the code ugly.

The approach PyCXX takes is to separate object construction into two:

  • tp_new creates a dummy object with just a pointer to the C++ object which is created tp_init. This pointer is initially null.

  • tp_init allocates and constructs the actual C++ object, then updates the pointer in the dummy object created in tp_new to point to it. If tp_init is called more than once it raises a Python exception.

I personally think that the overhead of this approach for my own applications is too high, but it's a legitimate approach. I have my own C++ wrapper around the Python C/API that does all the initialization in tp_new, which is also flawed. There doesn't appear to be a good solution for that.



回答2:

Here is a small C example that shows how Python allocates memory for object of classes derived from C types:

typedef struct
{
    PyObject_HEAD
    int dummy[100];
} xxx_obj;

It also needs a type object:

static PyTypeObject xxx_type = 
{
    PyObject_HEAD_INIT(NULL)
};

And a module initialization function that initializes this type:

extern "C"
void init_xxx(void)
{
    PyObject* m;

    xxx_type.tp_name = "_xxx.xxx";
    xxx_type.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE;

    xxx_type.tp_new = tp_new; // IMPORTANT
    xxx_type.tp_basicsize = sizeof(xxx_obj); // IMPORTANT

    if (PyType_Ready(&xxx_type) < 0)
        return;

    m = Py_InitModule3("_xxx", NULL, "");

    Py_INCREF(&xxx_type);
    PyModule_AddObject(m, "xxx", (PyObject *)&xxx_type);
}

What is missing is the implementation of tp_new: The Python docs require that:

The tp_new function should call subtype->tp_alloc(subtype, nitems) to allocate space for the object

So lets do that and add a few printouts.

static
PyObject *tp_new(PyTypeObject *subtype, PyObject *args, PyObject *kwds)
{
    printf("xxx.tp_new():\n\n");

    printf("\t subtype=%s\n", subtype->tp_name);
    printf("\t subtype->tp_base=%s\n", subtype->tp_base->tp_name);
    printf("\t subtype->tp_base->tp_base=%s\n", subtype->tp_base->tp_base->tp_name);

    printf("\n");

    printf("\t subtype->tp_basicsize=%ld\n", subtype->tp_basicsize);
    printf("\t subtype->tp_base->tp_basicsize=%ld\n", subtype->tp_base->tp_basicsize);
    printf("\t subtype->tp_base->tp_base->tp_basicsize=%ld\n", subtype->tp_base->tp_base->tp_basicsize);

    return subtype->tp_alloc(subtype, 0); // IMPORTANT: memory allocation is done here!
}

Now run a very simple Python program to test it. This program creates a new class derived from xxx, and then creates an object of type derived.

import _xxx

class derived(_xxx.xxx):
    def __init__(self):
        super(derived, self).__init__()

d = derived()

To create an object of type derived, Python will call its tp_new, which in turn will call its base class' (xxx) tp_new. This call generates the following output (exact numbers depends on the machine architecture):

xxx.tp_new():

    subtype=derived
    subtype->tp_base=_xxx.xxx
    subtype->tp_base->tp_base=object

    subtype->tp_basicsize=432
    subtype->tp_base->tp_basicsize=416
    subtype->tp_base->tp_base->tp_basicsize=16

The subtype argument to tp_new is the type of the object being created (derived), it derives from our C type (_xxx.xxx), which in turns derives from object. The base object is of size 16, which is just PyObject_HEAD, the xxx type has an additional 400 bytes for its dummy member for a total of 416 bytes and the derived Python class adds additional 16 bytes.

Because subtype->tp_basicsize accounts for the sizes of all three levels of the hierarchy (object, xxx and derived) for a total of 432 bytes, the right amount of memory is being allocated.