I get a weird problem with __future__.unicode_literals
in Python. Without importing unicode_literals
I get the correct output:
# encoding: utf-8
# from __future__ import unicode_literals
name = 'helló wörld from example'
print name
But when I add the unicode_literals
import:
# encoding: utf-8
from __future__ import unicode_literals
name = 'helló wörld from example'
print name
I got this error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 4: ordinal not in range(128)
Does unicode_literals
encode every string as an utf-8?
What should I do to override this error?
Your terminal or console is failing to let Python know it supports UTF-8.
Without the
from __future__ import unicode_literals
line, you are building a byte string that holds UTF-8 encoded bytes. With the string you are building aunicode
string.print
has to treat these two values differently; a byte string is written tosys.stdout
unchanged. Aunicode
string is encoded to bytes first, and Python consultssys.stdout.encoding
for that. If your system doesn't correctly tell Python what codec it supports, the default is to use ASCII.Your system failed to tell Python what codec to use;
sys.stdout.encoding
is set to ASCII, and encoding theunicode
value to print failed.You can verify this by manually encoding to UTF-8 when printing:
and you can reproduce the issue by creating unicode literals without the
from __future__
import statement too:where
u'..'
is a unicode literal too.Without details on what your environment is, it is hard to say what the solution is; this depends very much on the OS and console or terminal used.