Python 3: How do I get a string literal representa

2019-04-08 07:13发布

问题:

In Python 3, how do I interpolate a byte string into a regular string and get the same behavior as Python 2 (i.e.: get just the escape codes without the b prefix or double backslashes)?

e.g.:

Python 2.7:

>>> x = u'\u041c\u0438\u0440'.encode('utf-8')
>>> str(x)
'\xd0\x9c\xd0\xb8\xd1\x80'
>>> 'x = %s' % x
'x = \xd0\x9c\xd0\xb8\xd1\x80'

Python 3.3:

>>> x = u'\u041c\u0438\u0440'.encode('utf-8')
>>> str(x)
"b'\\xd0\\x9c\\xd0\\xb8\\xd1\\x80'"
>>> 'x = %s' % x
"x = b'\\xd0\\x9c\\xd0\\xb8\\xd1\\x80'"

Note how with Python 3, I get the b prefix in my output and double underscores. The result that I would like to get is the result that I get in Python 2.

回答1:

In Python 2 you have types str and unicode. str represents a simple byte string while unicode is a Unicode string.

For Python 3, this changed: Now str is what was unicode in Python 2 and byte is what was str in Python 2.

So when you do ("x = %s" % '\u041c\u0438\u0440').encode("utf-8") you can actually omit the u prefix, as it is implicit. Everything that is not explicitly converted in python is unicode.

This will yield your last line in Python 3:

 ("x = %s" % '\u041c\u0438\u0440').encode("utf-8")

Now how I encode after the final result, which is what you should always do: Take an incoming object, decode it to unicode (how ever you do that) and then, when making an output, encode it in the encoding of your choice. Don't try to handle raw byte strings. That is just ugly and deprecated behaviour.



回答2:

In your Python 3 example, you are interpolating into a Unicode string, not a byte string like you are doing in Python 2.

In Python 3, bytes do not support interpolation (string formatting or what-have-you).

Either concatenate, or use Unicode all through and only encode when you have interpolated:

b'x = ' + x

or

'x = {}'.format(x.decode('utf8')).encode('utf8')

or

x = '\u041c\u0438\u0440'  # the u prefix is ignored in Python 3.3
'x = {}'.format(x).encode('utf8')


回答3:

In Python 2, byte strings and regular strings are the same so there's no conversion done by str(). In Python 3 a string is always a Unicode string, so str() of a byte string does a conversion.

You can do your own conversion instead that does what you want:

x2 = ''.join(chr(c) for c in x)