Python print isn't using __repr__
, __unicode__
or __str__
for my unicode subclass when printing. Any clues as to what I am doing wrong?
Here is my code:
Using Python 2.5.2 (r252:60911, Oct 13 2009, 14:11:59)
>>> class MyUni(unicode):
... def __repr__(self):
... return "__repr__"
... def __unicode__(self):
... return unicode("__unicode__")
... def __str__(self):
... return str("__str__")
...
>>> s = MyUni("HI")
>>> s
'__repr__'
>>> print s
'HI'
I'm not sure if this is an accurate approximation of the above, but just for comparison:
>>> class MyUni(object):
... def __new__(cls, s):
... return super(MyUni, cls).__new__(cls)
... def __repr__(self):
... return "__repr__"
... def __unicode__(self):
... return unicode("__unicode__")
... def __str__(self):
... return str("__str__")
...
>>> s = MyUni("HI")
>>> s
'__repr__'
>>> print s
'__str__'
[EDITED...] It sounds like the best way to get a string object that isinstance(instance, basestring) and offers control over unicode return values, and with a unicode repr is...
>>> class UserUnicode(str):
... def __repr__(self):
... return "u'%s'" % super(UserUnicode, self).__str__()
... def __str__(self):
... return super(UserUnicode, self).__str__()
... def __unicode__(self):
... return unicode(super(UserUnicode, self).__str__())
...
>>> s = UserUnicode("HI")
>>> s
u'HI'
>>> print s
'HI'
>>> len(s)
2
The _str_ and _repr_ above add nothing to this example but the idea is to show a pattern explicitly, to be extended as needed.
Just to prove that this pattern grants control:
>>> class UserUnicode(str):
... def __repr__(self):
... return "u'%s'" % "__repr__"
... def __str__(self):
... return "__str__"
... def __unicode__(self):
... return unicode("__unicode__")
...
>>> s = UserUnicode("HI")
>>> s
u'__repr__'
>>> print s
'__str__'
Thoughts?
You are subclassing
unicode
.It'll never call
__unicode__
because it already is unicode. What happens here instead is that the object is encoded to thestdout
encoding:except that it'll use direct C calls instead of the
.encode()
method. This is the default behaviour forprint
for unicode objects.The
print
statement callsPyFile_WriteObject
, which in turn callsPyUnicode_AsEncodedString
when handling aunicode
object. The latter then defers to an encoding function for the current encoding, and these use the Unicode C macros to access the data structures directly. You cannot intercept this from Python.What you are looking for is an
__encode__
hook, I guess. Since this is already aunicode
subclass,print
needs only to encode, not to convert it tounicode
again, nor can it convert it to string without encoding it explicitly. You'd have to take this up with the Python core developers, to see if an__encode__
makes sense.The problem is that
print
doesn't respect__str__
onunicode
subclasses.From
PyFile_WriteObject
, used byprint
:PyUnicode_Check(v)
returns true ifv
's type isunicode
or a subclass. This code therefore writes unicode objects directly, without consulting__str__
.Note that subclassing
str
and overriding__str__
works as expected:as does calling
str
orunicode
explicitly:I believe this could be construed as a bug in Python as currently implemented.