I have a Python 3 program that reads some strings from a Windows-1252 encoded file:
with open(file, 'r', encoding="cp1252") as file_with_strings:
# save some strings
Which I later want to write to stdout. I've tried to do:
print(some_string)
# => UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 180: ordinal not in range(128)
print(some_string.decode("utf-8"))
# => AttributeError: 'str' object has no attribute 'decode'
sys.stdout.buffer.write(some_str)
# => TypeError: 'str' does not support the buffer interface
print(some_string.encode("cp1252").decode("utf-8"))
# => UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 180: invalid continuation byte
print(some_string.encode("cp1252"))
# => has the unfortunate result of printing b'<my string>' instead of just the string
I'm scratching my head here. I'd like to print the string I got from the file just as it appears there, in cp1252. (In my terminal, when I do more $file
, these characters appear as question marks, so my terminal is probably ascii.)
Would love some clarification! Thanks!
Eg:
This will print "hi hello\n" (which was encoded in cp1252) after decoding it.
You're either piping to your script or your locale is broken. You should fix your environment, rather than fixing your script to your environment, as this will make your script very brittle.
If you're piping, Python assumes the output should be "ASCII" and sets the encoding of stdout to "ASCII".
Under normal conditions, Python uses the
locale
to work out what encoding to apply to stdout. If your locale is broken (Not installed or corrupt), Python will default to "ASCII". A locale of "C", will also give you an encoding of "ASCII".Check your locale by typing
locale
and ensure no errors are returned. E.g.If all else fails or you're piping, you can override Python's locale detection by setting the
PYTHONIOENCODING
environment variable. E.g.Remember that your shell has a locale and your terminal has an encoding - they both need to be set correctly
To anybody out there with the same problem, I ended up doing: