How do I get Cyrillic in the output, Python?

how do I get Cyrillic instead of u'...

the code is like this

def openfile(filename):
    with codecs.open(filename, encoding="utf-8") as F:
        raw = F.read()
do stuff...
print some_text

prints

>>>[u'.', u',', u':', u'\u0432', u'<', u'>', u'(', u')', u'\u0437', u'\u0456']

标签： python utf-8 encode

3条回答

劳资没心，怎么记你

2楼-- · 2019-05-04 15:51

u'\uNNNN' is the ASCII-safe version of the string literal u'з':

>>> print u'\u0437'
з

However this will only display right for you if your console supports the character you are trying to print. Trying the above on the console on a Western European Windows install fails:

>>> print u'\u0437'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0437' in position 0: character maps to <undefined>

Because getting the Windows console to output Unicode is tricky, Python 2's repr function always opts for the ASCII-safe literal version.

Your print statement is outputting the repr version and not printing characters directly because you've got them inside a list of characters instead of a string. If you did print on each of the members of the list, you'd get the characters output directly and not represented as u'...' string literals.

0人赞添加讨论(0) 举报

混吃等死

3楼-- · 2019-05-04 16:08

It looks like some_text is a list of unicode objects. When you print such a list, it prints the reprs of the elements inside the list. So instead try:

print(u''.join(some_text))

The join method concatenates the elements of some_text, with an empty space, u'', in between the elements. The result is one unicode object.

0人赞添加讨论(0) 举报

Anthone

4楼-- · 2019-05-04 16:10

It's not clear to me where some_text comes from (you cut out that bit of your code), so I have no idea why it prints as a list of characters rather than a string.

But you should be aware that by default, Python tries to encode strings as ASCII when you print them to the terminal. If you want them to be encoded in some other coding system, you can do that explicitly:

>>> text = u'\u0410\u0430\u0411\u0431'
>>> print text
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3:
  ordinal not in range(128)
>>> print text.encode('utf8')
АаБб

0人赞添加讨论(0) 举报

How do I get Cyrillic in the output, Python?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间