How do I get Cyrillic in the output, Python?

2019-05-04 15:36发布

how do I get Cyrillic instead of u'...

the code is like this

def openfile(filename):
    with codecs.open(filename, encoding="utf-8") as F:
        raw = F.read()
do stuff...
print some_text

prints

>>>[u'.', u',', u':', u'\u0432', u'<', u'>', u'(', u')', u'\u0437', u'\u0456']

3条回答
劳资没心,怎么记你
2楼-- · 2019-05-04 15:51

u'\uNNNN' is the ASCII-safe version of the string literal u'з':

>>> print u'\u0437'
з

However this will only display right for you if your console supports the character you are trying to print. Trying the above on the console on a Western European Windows install fails:

>>> print u'\u0437'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0437' in position 0: character maps to <undefined>

Because getting the Windows console to output Unicode is tricky, Python 2's repr function always opts for the ASCII-safe literal version.

Your print statement is outputting the repr version and not printing characters directly because you've got them inside a list of characters instead of a string. If you did print on each of the members of the list, you'd get the characters output directly and not represented as u'...' string literals.

查看更多
混吃等死
3楼-- · 2019-05-04 16:08

It looks like some_text is a list of unicode objects. When you print such a list, it prints the reprs of the elements inside the list. So instead try:

print(u''.join(some_text))

The join method concatenates the elements of some_text, with an empty space, u'', in between the elements. The result is one unicode object.

查看更多
Anthone
4楼-- · 2019-05-04 16:10

It's not clear to me where some_text comes from (you cut out that bit of your code), so I have no idea why it prints as a list of characters rather than a string.

But you should be aware that by default, Python tries to encode strings as ASCII when you print them to the terminal. If you want them to be encoded in some other coding system, you can do that explicitly:

>>> text = u'\u0410\u0430\u0411\u0431'
>>> print text
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3:
  ordinal not in range(128)
>>> print text.encode('utf8')
АаБб
查看更多
登录 后发表回答