simple test program of an encoding issue:
#!/bin/env python
# -*- coding: utf-8 -*-
print u"Råbjerg" # >>> unicodedata.name(u"å") = 'LATIN SMALL LETTER A WITH RING ABOVE'
here is what i get when i use it from a debian command box, i do not understand why using redirect here broke the thing, as i can see it correctly when using without.
can someone help to understand what i have missed? and what should the right way to print this characters so that they are ok everywhere?
$ python testu.py
Råbjerg
$ python testu.py > A
Traceback (most recent call last):
File "testu.py", line 3, in <module>
print u"Råbjerg"
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 1: ordinal not in range(128)
using debian Debian GNU/Linux 6.0.7 (squeeze) configured with:
$ locale
LANG=fr_FR.UTF-8
LANGUAGE=
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_PAPER="fr_FR.UTF-8"
LC_NAME="fr_FR.UTF-8"
LC_ADDRESS="fr_FR.UTF-8"
LC_TELEPHONE="fr_FR.UTF-8"
LC_MEASUREMENT="fr_FR.UTF-8"
LC_IDENTIFICATION="fr_FR.UTF-8"
LC_ALL=
EDIT: from other similar questions seen later from the pointing done below
#!/bin/env python1
# -*- coding: utf-8 -*-
import sys, locale
s = u"Råbjerg" # >>> unicodedata.name(u"å") = 'LATIN SMALL LETTER A WITH RING ABOVE'
if sys.stdout.encoding is None: # if it is a pipe, seems python2 return None
s = s.encode(locale.getpreferredencoding())
print s
I'll suggest you to output it already encoded:
This will write the correct bytes of the string in utf-8 and you'll be able to see in almost every editor/console which support
utf-8
When redirecting the output,
sys.stdout
is not connected to a terminal and Python cannot determine the output encoding. When not directing the output, Python can detect thatsys.stdout
is a TTY and will use the codec configured for that TTY when printing unicode.Set the
PYTHONIOENCODING
environment variable to tell Python what encoding to use in such cases, or encode explicitly.Use:
print u"Råbjerg".encode('utf-8')
Similar question was asked today : Understanding Python Unicode and Linux terminal