unicode_literals and type()

2019-04-04 17:41发布

问题:

I'm having problems supporting python2 and python3 on a type() call. This demonstrates the problem:

from __future__ import unicode_literals

name='FooClass'
type(name, (dict,), {})

No problem on python3, but on python2:

Traceback (most recent call last):
  File "test.py", line 6, in <module>
    type(name, (dict,), {})
TypeError: type() argument 1 must be string, not unicode

This is related to Any gotchas using unicode_literals in Python 2.6?. In that question, someone recommends typecasting to a bytestring, so naively I thought about using six.b():

A “fake” bytes literal. data should always be a normal string literal. In Python 2, b() returns a 8-bit string. In Python 3, data is encoded with the latin-1 encoding to bytes.

So it looks like this:

from __future__ import unicode_literals
import six

name='FooClass'
type(six.b(name), (dict,), {})

But it fails on both python2 and python3:

$ python2 test.py 
Traceback (most recent call last):
  File "test.py", line 6, in <module>
    type(six.b(name), (dict,), {})
TypeError: type() argument 1 must be string, not unicode

$ python3 test.py 
Traceback (most recent call last):
  File "test.py", line 6, in <module>
    type(six.b(name), (dict,), {})
TypeError: type() argument 1 must be str, not bytes

So it seems that really, type() wants a python2 str which is a python3 bytestring on python2, but it wants a python3 str which is a python2 unicode string on python3.

What do you think ?

Is there something I don't understand ?

Or is there a real incompatibility with type() on python 2 and 3 ?

Isn't there any way to have the same type() call supporting both 2 and 3 ?

Shouldn't a tool like six provide a wrapper around type() in that case ?

回答1:

six.b is written under the assumption that you won't use unicode_literals (and that you'll pass a string literal to it, as the documentation states), so the Python 2 implementation is just def b(s): return s as a Python 2 string literal is already a byte string.

Either don't use unicode_literals in this module, or use (as a comment suggests) str(name). In Python 3, that is a no-op. In Python 2, it silently converts the unicode string to a byte string (assuming some encoding that I can't be bothered to remember, but it's a superset of ASCII so you should be fine).