Force UTF-8 output using python

2019-08-07 13:14发布

问题:

I have the following error:

UnicodeEncodeError: 'ascii' codec can't encode character '\xd7' in position 31: ordinal not in range(128)

from this code :

test_string = """
Antelope Canyon, Arizona [1600×1068] </a>&#32; <span class="domain">(<a
"""

print(test_string)

output of sys.getdefaultencoding :

In [6]: sys.getdefaultencoding()
Out[10]: 'utf-8'

I'm using a Chromebook with crouton - if that makes a difference (I've a feeling that it might).

I'm not sure if there's some way of 'forcing' the output of strings like this or just ignoring any chars that are problematic.

terminal or console o redirect cannot handle UTF-8; what environment are you trying to print in.

I'm trying to run this using iPython within Spacemacs

In [22]: sys.stdout.encoding
Out[27]: 'ANSI_X3.4-1968'

In the shell, what does the command locale output?

In the shell I'm running this within (iPython within Spacemacs) the command is undefined, on the default shell brought up with ctrl alt t the output is

$ locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

回答1:

On a POSIX host, Python determines the output encoding from the locale, a set of environment variables that communicate how the environment is configured for various language settings. See the locale.getdefaultlocale() function, or more specifically, the locale.getpreferredencoding() function.

The output of that function is used to set sys.stdout.encoding, which is then used to encode any Unicode text printed.

Your locale is set to POSIX, which means that the default encoding is ASCII. You'll need to configure that locale to use an encoding that supports all of Unicode. How to do this for Chromebooks, I don't know. On my Mac, the locale is set to en_US.UTF-8, mostly, so all of the Unicode standard is supported by my terminal. You could force the issue by setting export LC_CTYPE=en_US.UTF-8.

You can override Python's choices by setting the PYTHONIOENCODING environment variable.

Note that on more recent Python 3 releases, sys.stdout and sys.stderr use the backslashescape error handler, which replaces any character your console can't handle with the standard \xhh, \uhhhh and \Uhhhhhhhh escape sequences; so instead of an exception you'd see:

Antelope Canyon, Arizona [1600\xd71068] </a>&#32; <span class="domain">(<a 


回答2:

Ah, after search and search, I found this. As it says maybe you could try:

  1. Edit (create it first) /etc/locale.gen file.
  2. Write the following text in it:

    en_GB.UTF-8 UTF-8
    LC_ALL="en_GB.UTF-8"
    
  3. Maybe try reboot the Chromebook.

And then check the locale command's output.