After I learned about reading unicode files in Python 3.0 web script, now it's time for me to learn using print()
with unicode.
I searched for writing unicode, for example this question explains that you can't write unicode characters to non-unicode console. However, in my case, the output is given to Apache and I am sure that it is capable of handling unicode text. For some reason, however, the stdout
of my web script is in ascii
.
Obviously, if I was opening a file to write myself, I would do something like
open(filename, 'w', encoding='utf8')
but since I'm given an open stream, I resorted to using
sys.stdout.buffer.write(mytext.encode('utf-8'))
and everything seems to work. Does this violate some rule of good behavior or has any unintended consequences?
I don't think you're breaking any rule, but
sys.stdout = codecs.EncodedFile(sys.stdout, 'utf8')
looks like it might be handier / less clunky.
Edit: per comments, this isn't quite right -- @Miles gave the right variant (thanks!):
sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer)
Edit: if you can arrange for environment variable PYTHONIOENCODING
to be set to utf8 when Apache starts your script, that would be even better, making sys.stdout
be set to utf8
automatically; but if that's unfeasible or impractical the codecs
solution stands.
This is an old answer but I'll add my version here since I first ventured here before finding my solution.
One of the issues with codecs.getwriter is if you are running a script of sorts, the output will be buffered (whereas normally python stdout prints after every line).
sys.stdout
in the console is a IOTextWrapper, so my solution uses that. This also allows you to set line_buffering=True or False.
For example, to set stdout to, instead of erroring, backslash encode all output:
sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding=sys.stdout.encoding,
errors="backslashreplace", line_buffering=True)
To force a specific encoding (in this case utf8):
sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding="utf8",
line_buffering=True)
A note, calling sys.stdout.detach() will close the underlying buffer. Some modules use sys.__stdout__
, which is just an alias for sys.stdout
, so you may want to set that as well
sys.stdout = sys.__stdout__ = io.TextIOWrapper(sys.stdout.detach(), encoding=sys.stdout.encoding, errors="backslashreplace", line_buffering=True)
sys.stderr = sys.__stderr__ = io.TextIOWrapper(sys.stderr.detach(), encoding=sys.stdout.encoding, errors="backslashreplace", line_buffering=True)