I'm trying to overload some methods of the string builtin.
I know there is no really legitimate use-case for this, but the behavior still bugs me so I would like to get an explanation of what is happening here:
Using Python2, and the forbiddenfruit
module.
>>> from forbiddenfruit import curse
>>> curse(str, '__repr__', lambda self:'bar')
>>> 'foo'
'foo'
>>> 'foo'.__repr__()
'bar'
As you can see, the __repr__
function as been successfully overloaded, but isn't actually called when when we ask for a representation. Why is that?
Then, how would you do to get the expected behaviour:
>>> 'foo'
'bar'
There is no constraint about setting up a custom environment, if rebuilding python is what it takes, so be it, but I really don't know where to start, and I still hope there is a easier way :)
The first thing to note is that whatever forbiddenfruit
is doing, it's not affecting repr
at all. This isn't a special case for str
, it just doesn't work like that:
import forbiddenfruit
class X:
repr = None
repr(X())
#>>> '<X object at 0x7f907acf4c18>'
forbiddenfruit.curse(X, "__repr__", lambda self: "I am X")
repr(X())
#>>> '<X object at 0x7f907acf4c50>'
X().__repr__()
#>>> 'I am X'
X.__repr__ = X.__repr__
repr(X())
#>>> 'I am X'
I recently found a much simpler way of doing what forbiddenfruit
does thanks to a post by HYRY:
import gc
underlying_dict = gc.get_referents(str.__dict__)[0]
underlying_dict["__repr__"] = lambda self: print("I am a str!")
"hello".__repr__()
#>>> I am a str!
repr("hello")
#>>> "'hello'"
So we know, somewhat anticlimactically, that something else is going on.
Here's the source for builtin_repr
:
builtin_repr(PyModuleDef *module, PyObject *obj)
/*[clinic end generated code: output=988980120f39e2fa input=a2bca0f38a5a924d]*/
{
return PyObject_Repr(obj);
}
And for PyObject_Repr
(sections elided):
PyObject *
PyObject_Repr(PyObject *v)
{
PyObject *res;
res = (*v->ob_type->tp_repr)(v);
if (res == NULL)
return NULL;
}
The important point is that instead of looking up in a dict
, it looks up the "cached" tp_repr
attribute.
Here's what happens when you set the attribute with something like TYPE.__repr__ = new_repr
:
static int
type_setattro(PyTypeObject *type, PyObject *name, PyObject *value)
{
if (!(type->tp_flags & Py_TPFLAGS_HEAPTYPE)) {
PyErr_Format(
PyExc_TypeError,
"can't set attributes of built-in/extension type '%s'",
type->tp_name);
return -1;
}
if (PyObject_GenericSetAttr((PyObject *)type, name, value) < 0)
return -1;
return update_slot(type, name);
}
The first part is the thing preventing you from modifying built-in types. Then it sets the attribute generically (PyObject_GenericSetAttr
) and, crucially, updates the slots.
If you're interested in how that works, it's available here. The crucial points are:
so replicating it would require hacking into the PyTypeObject
type itself.
If you want to do so, probably the easiest thing to try would be (temporarily?) setting type->tp_flags & Py_TPFLAGS_HEAPTYPE
on the str
class. This would allow setting the attribute normally. Of course, there are no guarantees this won't crash your interpreter.
This is not what I want to do (especially not through ctypes
) unless I really have to, so I offer you a shortcut.
You write:
Then, how would you do to get the expected behaviour:
>>> 'foo'
'bar'
This is actually quite easy using sys.displayhook
:
sys.displayhook
is called on the result of evaluating an expression entered in an interactive Python session. The display of these values can be customized by assigning another one-argument function to sys.displayhook
.
And here's an example:
import sys
old_displayhook = sys.displayhook
def displayhook(object):
if type(object) is str:
old_displayhook('bar')
else:
old_displayhook(object)
sys.displayhook = displayhook
And then... (!)
'foo'
#>>> 'bar'
123
#>>> 123
On the philosophical point of why repr
would be cached as so, first consider:
1 + 1
It would be a pain if this had to look-up __add__
in a dictionary before calling, CPython is slow as it is, so CPython decided to cache lookups to standard dunder (double underscore) methods. __repr__
is one of those, even if it is less common to need the lookup optimized. This is still useful to keep formatting ('%s'%s
) fast.