how do I get Cyrillic instead of u'...
the code is like this
def openfile(filename):
with codecs.open(filename, encoding="utf-8") as F:
raw = F.read()
do stuff...
print some_text
prints
>>>[u'.', u',', u':', u'\u0432', u'<', u'>', u'(', u')', u'\u0437', u'\u0456']
u'\uNNNN'
is the ASCII-safe version of the string literalu'з'
:However this will only display right for you if your console supports the character you are trying to print. Trying the above on the console on a Western European Windows install fails:
Because getting the Windows console to output Unicode is tricky, Python 2's
repr
function always opts for the ASCII-safe literal version.Your
print
statement is outputting therepr
version and not printing characters directly because you've got them inside a list of characters instead of a string. If you didprint
on each of the members of the list, you'd get the characters output directly and not represented asu'...'
string literals.It looks like
some_text
is a list of unicode objects. When you print such a list, it prints thereprs
of the elements inside the list. So instead try:The join method concatenates the elements of
some_text
, with an empty space,u''
, in between the elements. The result is one unicode object.It's not clear to me where
some_text
comes from (you cut out that bit of your code), so I have no idea why it prints as a list of characters rather than a string.But you should be aware that by default, Python tries to encode strings as ASCII when you print them to the terminal. If you want them to be encoded in some other coding system, you can do that explicitly: