I'am trying to get running a very simple example on OSX with python 3.5.1 but I'm really stucked. Have read so many articles that deal with similar problems but I can not fix this by myself. Do you have any hints how to resolve this issue?
I would like to have the correct encoded latin-1 output as defined in mylist without any errors.
My code:
# coding=<latin-1>
mylist = [u'Glück', u'Spaß', u'Ähre',]
print(mylist)
The error:
Traceback (most recent call last):
File "/Users/abc/test.py", line 4, in <module>
print(mylist)
UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position 4: ordinal not in range(128)
How I can fix the error but still get something wrong with stdout (print):
mylist = [u'Glück', u'Spaß', u'Ähre',]
for w in mylist:
print(w.encode("latin-1"))
What I get as output:
b'Gl\xfcck'
b'Spa\xdf'
b'\xc4hre'
What 'locale' shows me:
LANG="de_AT.UTF-8"
LC_COLLATE="de_AT.UTF-8"
LC_CTYPE="de_AT.UTF-8"
LC_MESSAGES="de_AT.UTF-8"
LC_MONETARY="de_AT.UTF-8"
LC_NUMERIC="de_AT.UTF-8"
LC_TIME="de_AT.UTF-8"
LC_ALL=
What
-> 'python3' shows me:
Python 3.5.1 (default, Jan 22 2016, 08:54:32)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
Remove the characters <
and >
:
# coding=latin-1
Those character are often used in examples to indicate where the encoding name goes, but the literal characters <
and >
should not be included in your file.
For that to work, your file must be encoded using latin-1. If your file is actually encoded using utf-8, the encoding line should be
# coding=utf-8
For example, when I run this script (saved as a file with latin-1 encoding):
# coding=latin-1
mylist = [u'Glück', u'Spaß', u'Ähre',]
print(mylist)
for w in mylist:
print(w.encode("latin-1"))
I get this output (with no errors):
['Glück', 'Spaß', 'Ähre']
b'Gl\xfcck'
b'Spa\xdf'
b'\xc4hre'
That output looks correct. For example, the latin-1 encoding of ü is '\xfc'
.
I used my editor to save the file with latin-1 encoding. The contents of the file in hexadecimal are:
$ hexdump -C codec-question.py
00000000 23 20 63 6f 64 69 6e 67 3d 6c 61 74 69 6e 2d 31 |# coding=latin-1|
00000010 0a 0a 6d 79 6c 69 73 74 20 3d 20 5b 75 27 47 6c |..mylist = [u'Gl|
00000020 fc 63 6b 27 2c 20 75 27 53 70 61 df 27 2c 20 75 |.ck', u'Spa.', u|
00000030 27 c4 68 72 65 27 2c 5d 0a 70 72 69 6e 74 28 6d |'.hre',].print(m|
00000040 79 6c 69 73 74 29 0a 0a 66 6f 72 20 77 20 69 6e |ylist)..for w in|
00000050 20 6d 79 6c 69 73 74 3a 0a 20 20 20 20 70 72 69 | mylist:. pri|
00000060 6e 74 28 77 2e 65 6e 63 6f 64 65 28 22 6c 61 74 |nt(w.encode("lat|
00000070 69 6e 2d 31 22 29 29 0a |in-1")).|
00000078
Note that the first byte (represented in hexadecimal) in the third line (i.e. the character at position 0x20) is fc
. That is the latin-1 encoding of ü. If the file was encoded using utf-8, the character ü would be represented using two bytes, c3 bc
.
Try running your script with explicitly defined PYTHONIOENCODING
environment variable:
PYTHONIOENCODING=utf-8 python3 script.py
Your environment variables set wrong. Work's for me:
echo "LC_ALL=en_US.UTF-8" >> /etc/environment
echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen
echo "LANG=en_US.UTF-8" > /etc/locale.conf
locale-gen en_US.UTF-8