python… encoding issue when using linux > [duplica

2020-01-29 17:01发布

问题:

simple test program of an encoding issue:

#!/bin/env python
# -*- coding: utf-8 -*-
print u"Råbjerg"      # >>> unicodedata.name(u"å") = 'LATIN SMALL LETTER A WITH RING ABOVE'

here is what i get when i use it from a debian command box, i do not understand why using redirect here broke the thing, as i can see it correctly when using without.

can someone help to understand what i have missed? and what should the right way to print this characters so that they are ok everywhere?

$ python testu.py
Råbjerg

$ python testu.py > A
Traceback (most recent call last):
  File "testu.py", line 3, in <module>
    print u"Råbjerg"
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 1: ordinal not in range(128)

using debian Debian GNU/Linux 6.0.7 (squeeze) configured with:

$ locale
LANG=fr_FR.UTF-8
LANGUAGE=
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_PAPER="fr_FR.UTF-8"
LC_NAME="fr_FR.UTF-8"
LC_ADDRESS="fr_FR.UTF-8"
LC_TELEPHONE="fr_FR.UTF-8"
LC_MEASUREMENT="fr_FR.UTF-8"
LC_IDENTIFICATION="fr_FR.UTF-8"
LC_ALL=

EDIT: from other similar questions seen later from the pointing done below

#!/bin/env python1
# -*- coding: utf-8 -*-
import sys, locale
s = u"Råbjerg"      # >>> unicodedata.name(u"å") = 'LATIN SMALL LETTER A WITH RING ABOVE'
if sys.stdout.encoding is None: # if it is a pipe, seems python2 return None
    s = s.encode(locale.getpreferredencoding())
print s

回答1:

When redirecting the output, sys.stdout is not connected to a terminal and Python cannot determine the output encoding. When not directing the output, Python can detect that sys.stdout is a TTY and will use the codec configured for that TTY when printing unicode.

Set the PYTHONIOENCODING environment variable to tell Python what encoding to use in such cases, or encode explicitly.



回答2:

Use: print u"Råbjerg".encode('utf-8')

Similar question was asked today : Understanding Python Unicode and Linux terminal



回答3:

I'll suggest you to output it already encoded:

print u"Råbjerg".encode('utf-8')

This will write the correct bytes of the string in utf-8 and you'll be able to see in almost every editor/console which support utf-8