Calling __new__ when making a subclass of tuple [d

2020-07-13 04:22发布

问题:

In Python, when subclassing tuple, the __new__ function is called with self as an argument. For example, here is a paraphrased version of PySpark's Row class:

class Row(tuple):
    def __new__(self, args):
        return tuple.__new__(self, args)

But help(tuple) shows no self argument to __new__:

  __new__(*args, **kwargs) from builtins.type
      Create and return a new object.  See help(type) for accurate signature.

and help(type) just says the same thing:

__new__(*args, **kwargs)
      Create and return a new object.  See help(type) for accurate signature.

So how does self get passed to __new__ in the Row class definition?

  • Is it via *args?
  • Does __new__ have some subtlety where its signature can change with context?
  • Or, is the documentation mistaken?

Is it possible to view the source of tuple.__new__ so I can see the answer for myself?

My question is not a duplicate of this one because in that question, all discussion refers to __new__ methods that explicitly have self or cls as first argument. I'm trying to understand

  1. Why the tuple.__new__ method does not have self or cls as first argument.
  2. How I might go about examining the source code of the tuple class, to see for myself what's really going on.

回答1:

The correct signature of tuple.__new__

Functions and types implemented in C often can't be inspected, and their signature always look like that one.

The correct signature of tuple.__new__ is:

__new__(cls[, sequence])

For example:

>>> tuple.__new__(tuple)
()
>>> tuple.__new__(tuple, [1, 2, 3])
(1, 2, 3)

Not surprisingly, this is exactly as calling tuple(), except for the fact that you have to repeat tuple twice.


The first argument of __new__

Note that the first argument of __new__ is always the class, not the instance. In fact, the role of __new__ is to create and return the new instance.

The special method __new__ is a static method.

I'm saying this because in your Row.__new__ I can see self: while the name of the argument is not important (except when using keyword arguments), beware that self will be Row or a subclass of Row, not an instance. The general convention is to name the first argument cls instead of self.


Back to your questions

So how does self get passed to __new__ in the Row class definition?

When you call Row(...), Python automatically calls Row.__new__(Row, ...).

  • Is it via *args?

You can write your Row.__new__ as follows:

class Row(tuple):
    def __new__(*args, **kwargs):
        return tuple.__new__(*args, **kwargs)

This works and there's nothing wrong about it. It's very useful if you don't care about the arguments.

  • Does __new__ have some subtlety where its signature can change with context?

No, the only special thing about __new__ is that it is a static method.

  • Or, is the documentation mistaken?

I'd say that it is incomplete or ambiguous.

  • Why the tuple.__new__ method does not have self or cls as first argument.

It does have, it's just not appearing on help(tuple.__new__), because often that information is not exposed by functions and methods implemented in C.

  • How I might go about examining the source code of the tuple class, to see for myself what's really going on.

The file you are looking for is Objects/tupleobject.c. Specifically, you are interested in the tuple_new() function:

static char *kwlist[] = {"sequence", 0};
/* ... */
if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O:tuple", kwlist, &arg))

Here "|O:tuple" means: the function is called "tuple" and it accepts one optional argument (| delimits optional arguments, O stands for a Python object). The optional argument may be set via the keyword argument sequence.


About help(type)

For the reference, you were looking at the documentation of type.__new__, while you should have stopped at the first four lines of help(type):

In the case of __new__() the correct signature is the signature of type():

class type(object)
 |  type(object_or_name, bases, dict)
 |  type(object) -> the object's type
 |  type(name, bases, dict) -> a new type

But this is not relevant, as tuple.__new__ has a different signature.


Remember super()!

Last but not least, try to use super() instead of calling tuple.__new__() directly.