How to prevent str to encode unicode characters as

When I print a unicode string in Python directly, I see a string with same characters that I have in my string. When I embed it into some container (put in a list, in a map, etc), str representation converts unicode characters to \uXXXX representation. Interestingly, I can call a print on this container with a string, but cannot print str of a string itself (gives a UnicodeEncodeError).

Can I configure str to encode nested strings to UTF8 strings? Looking at this hex symbols makes debugging very painful.

Example:

>>> v = u"abc123абв"
>>> d = [v]
>>> print v
abc123абв
>>> print d
[u'abc123\u0430\u0431\u0432']
>>> print str(v)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec cant encode characters in position 6-8: ordinal not in range(128)
>>> print str(d)
[u'abc123\u0430\u0431\u0432']

I'm using Python 2.7.6 on ubuntu and console encoding is UTF8. Python seems to use UTF8 as well:

>>> print(sys.stdout.encoding)
UTF-8
>>> print(locale.getpreferredencoding())
UTF-8
>>> print(sys.getfilesystemencoding())
UTF-8

标签： python unicode encoding

2条回答

beautiful°

2楼-- · 2019-01-20 19:07

Don't change str, change your way of thinking.

If you need to print netsted element than get it from container and print it - don't print all container.

v = u"abc123абв"
d = [v, v, v]

print d[0]
# abc123абв

print ", ".join(d)
# abc123абв, abc123абв, abc123абв

btw: Python print hex code (and other elements) for testing/debuging reason.

When you see

[u'abc123\u0430\u0431\u0432']

you know: it is list ([ and ]) with unicode text (u and ') and there are non-ASCII chars in that text.

0人赞添加讨论(0) 举报

Emotional °昔

3楼-- · 2019-01-20 19:29

print [v] calls repr(v) that returns ascii-printable characters as is and everything else is escaped using \x, \u, \U, ...

Remember an object such as dict(a=1) is different from its text representation (repr(dict(a=1))). Unicode string is an object too (type(v) == unicode) like any other and therefore repr(v) is not v (btw, repr(repr(v)) is not repr(v) too -- think about it).

To display human-readable text for debugging in Python console, you could provide custom sys.displayhook e.g., you could encode any (embedded) unicode object using sys.stdout.encoding. In Python 3, repr(unicode_string) returns Unicode characters that are printable in the current environment as is (characters that would cause UnicodeEncodeError are escaped).

str(v) raising UnicodeEncodeError is unrelated. str(v) calls v.encode(sys.getdefaultencoding()) and therefore it fails for any unicode string with non-ascii characters. Do not call str() on Unicode strings (it is almost always an error), print Unicode directly instead.

0人赞添加讨论(0) 举报

How to prevent str to encode unicode characters as

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间