Setting the default output encoding in Python 2 is a well-known idiom:
sys.stdout = codecs.getwriter("utf-8")(sys.stdout)
This wraps the sys.stdout
object in a codec writer that encodes output in UTF-8.
However, this technique does not work in Python 3 because sys.stdout.write()
expects a str
, but the result of encoding is bytes
, and an error occurs when codecs
tries to write the encoded bytes to the original sys.stdout
.
What is the correct way to do this in Python 3?
Since Python 3.7 you can change the encoding of standard streams with
reconfigure()
:You can also modify how encoding errors are handled by adding an
errors
parameter.Eek! Is that a well-known idiom in Python 2? It looks like a dangerous mistake to me.
It'll certainly mess up any script that tries to write binary to stdout (which you'll need if you're a CGI script returning an image, for example). Bytes and chars are quite different animals; it's not a good idea to monkey-patch an interface that is specified to accept bytes with one that only takes chars.
CGI and HTTP in general explicitly work with bytes. You should only be sending bytes to sys.stdout. In Python 3 that means using
sys.stdout.buffer.write
to send bytes directly. Encoding page content to match itscharset
parameter should be handled at a higher level in your application (in cases where you are returning textual content, rather than binary). This also meansprint
is no good for CGI any more.(To add to the confusion, wsgiref's CGIHandler has been broken in py3k until very recently, making it impossible to deploy WSGI to CGI that way. With PEP 3333 and Python 3.2 this is finally workable.)
Using
detach()
causes the interpreter to print a warning when it tries to close stdout just before it exits:Instead, this worked fine for me:
(And, of course, writing to
default_out
instead of stdout.)I found this thread while searching for solutions to the same error,
An alternative solution to those already suggested is to set the
PYTHONIOENCODING
environment variable before Python starts, for my use - this is less trouble then swappingsys.stdout
after Python is initialized:With the advantage of not having to go and edit the Python code.
sys.stdout is in text mode in Python 3. Hence you write unicode to it directly, and the idiom for Python 2 is no longer needed.
Where this would fail in Python 2:
However, it works just dandy in Python 3:
Now if your Python doesn't know what your stdouts encoding actually is, that's a different problem, most likely in the build of the Python.
Python 3.1 added
io.TextIOBase.detach()
, with a note in the documentation forsys.stdout
:Therefore, the corresponding idiom for Python 3.1 and later is: