Python 3, is using sys.stdout.buffer.write() good

2019-03-20 16:54发布

问题:

After I learned about reading unicode files in Python 3.0 web script, now it's time for me to learn using print() with unicode.

I searched for writing unicode, for example this question explains that you can't write unicode characters to non-unicode console. However, in my case, the output is given to Apache and I am sure that it is capable of handling unicode text. For some reason, however, the stdout of my web script is in ascii.

Obviously, if I was opening a file to write myself, I would do something like

open(filename, 'w', encoding='utf8')

but since I'm given an open stream, I resorted to using

sys.stdout.buffer.write(mytext.encode('utf-8'))

and everything seems to work. Does this violate some rule of good behavior or has any unintended consequences?

回答1:

I don't think you're breaking any rule, but

sys.stdout = codecs.EncodedFile(sys.stdout, 'utf8')

looks like it might be handier / less clunky.

Edit: per comments, this isn't quite right -- @Miles gave the right variant (thanks!):

sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer) 

Edit: if you can arrange for environment variable PYTHONIOENCODING to be set to utf8 when Apache starts your script, that would be even better, making sys.stdout be set to utf8 automatically; but if that's unfeasible or impractical the codecs solution stands.



回答2:

This is an old answer but I'll add my version here since I first ventured here before finding my solution.

One of the issues with codecs.getwriter is if you are running a script of sorts, the output will be buffered (whereas normally python stdout prints after every line).

sys.stdout in the console is a IOTextWrapper, so my solution uses that. This also allows you to set line_buffering=True or False.

For example, to set stdout to, instead of erroring, backslash encode all output:

sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding=sys.stdout.encoding,
                              errors="backslashreplace", line_buffering=True)

To force a specific encoding (in this case utf8):

sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding="utf8",
                              line_buffering=True)

A note, calling sys.stdout.detach() will close the underlying buffer. Some modules use sys.__stdout__, which is just an alias for sys.stdout, so you may want to set that as well

sys.stdout = sys.__stdout__ = io.TextIOWrapper(sys.stdout.detach(), encoding=sys.stdout.encoding, errors="backslashreplace", line_buffering=True)
sys.stderr = sys.__stderr__ = io.TextIOWrapper(sys.stderr.detach(), encoding=sys.stdout.encoding, errors="backslashreplace", line_buffering=True)