Printing to stdout with encoding in Python 3

2019-09-05 18:13发布

I have a Python 3 program that reads some strings from a Windows-1252 encoded file:

with open(file, 'r', encoding="cp1252") as file_with_strings:
    # save some strings

Which I later want to write to stdout. I've tried to do:

print(some_string)
# => UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 180: ordinal not in range(128)

print(some_string.decode("utf-8"))
# => AttributeError: 'str' object has no attribute 'decode'

sys.stdout.buffer.write(some_str)
# => TypeError: 'str' does not support the buffer interface

print(some_string.encode("cp1252").decode("utf-8"))
# => UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 180: invalid continuation byte

print(some_string.encode("cp1252"))
# => has the unfortunate result of printing b'<my string>' instead of just the string

I'm scratching my head here. I'd like to print the string I got from the file just as it appears there, in cp1252. (In my terminal, when I do more $file, these characters appear as question marks, so my terminal is probably ascii.)

Would love some clarification! Thanks!

3条回答
Luminary・发光体
2楼-- · 2019-09-05 18:47

When you encode with cp1252, you have to decode with the same.

Eg:

import sys
txt = ("hi hello\n").encode("cp1252")
#print((txt).decode("cp1252"))
sys.stdout.buffer.write(txt)
sys.stdout.flush()

This will print "hi hello\n" (which was encoded in cp1252) after decoding it.

查看更多
可以哭但决不认输i
3楼-- · 2019-09-05 18:57

You're either piping to your script or your locale is broken. You should fix your environment, rather than fixing your script to your environment, as this will make your script very brittle.

If you're piping, Python assumes the output should be "ASCII" and sets the encoding of stdout to "ASCII".

Under normal conditions, Python uses the locale to work out what encoding to apply to stdout. If your locale is broken (Not installed or corrupt), Python will default to "ASCII". A locale of "C", will also give you an encoding of "ASCII".

Check your locale by typing locale and ensure no errors are returned. E.g.

$ locale
LANG="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_CTYPE="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_ALL=

If all else fails or you're piping, you can override Python's locale detection by setting the PYTHONIOENCODING environment variable. E.g.

$ PYTHONIOENCODING=utf-8 ./my_python.sh

Remember that your shell has a locale and your terminal has an encoding - they both need to be set correctly

查看更多
▲ chillily
4楼-- · 2019-09-05 19:01

To anybody out there with the same problem, I ended up doing:

to_print = (some_string + "\n").encode("cp1252")
sys.stdout.buffer.write(to_print)
sys.stdout.flush() # I write a ton of these strings, and segfaulted without flushing
查看更多
登录 后发表回答