I'm running a recent Linux system where all my locales are UTF-8:
LANG=de_DE.UTF-8
LANGUAGE=
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
...
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=
Now I want to write UTF-8 encoded content to the console.
Right now Python uses UTF-8 for the FS encoding but sticks to ASCII for the default encoding :-(
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'UTF-8'
I thought the best (clean) way to do this was setting the PYTHONIOENCODING
environment variable. But it seems that Python ignores it. At least on my system I keep getting ascii
as default encoding, even after setting the envvar.
# tried this in ~/.bashrc and ~/.profile (also sourced them)
# and on the commandline before running python
export PYTHONIOENCODING=UTF-8
If I do the following at the start of a script, it works though:
>>> import sys
>>> reload(sys) # to enable `setdefaultencoding` again
<module 'sys' (built-in)>
>>> sys.setdefaultencoding("UTF-8")
>>> sys.getdefaultencoding()
'UTF-8'
But that approach seems unclean. So, what's a good way to accomplish this?
Workaround
Instead of changing the default encoding - which is not a good idea (see mesilliac's answer) - I just wrap sys.stdout
with a StreamWriter
like this:
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
See this gist for a small utility function, that handles it.
i.e., if you have a Unicode string then print it directly. If you have a bytestring then convert it to Unicode first.
Your locale settings (
LANG
,LC_CTYPE
) indicate a utf-8 locale and therefore (in theory) you could print a utf-8 bytestring directly and it should be displayed correctly in your terminal (if terminal settings are consistent with the locale settings and they should be) but you should avoid it: do not hardcode the character encoding of your environment inside your script; print Unicode directly instead.There are many wrong assumptions in your question.
You do not need to set
PYTHONIOENCODING
with your locale settings, to print Unicode to the terminal. utf-8 locale supports all Unicode characters i.e., it works as is.You do not need the workaround
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
. It may break if some code (that you do not control) does need to print bytes and/or it may break while printing Unicode to Windows console (wrong codepage, can't print undecodable characters). Correct locale settings and/orPYTHONIOENCODING
envvar are enough. Also, if you need to replacesys.stdout
then useio.TextIOWrapper()
instead ofcodecs
module likewin-unicode-console
package does.sys.getdefaultencoding()
is unrelated to your locale settings and toPYTHONIOENCODING
. Your assumption that settingPYTHONIOENCODING
should changesys.getdefaultencoding()
is incorrect. You should checksys.stdout.encoding
instead.sys.getdefaultencoding()
is not used when you print to the console. It may be used as a fallback on Python 2 if stdout is redirected to a file/pipe unlessPYTHOHIOENCODING
is set:Do not call
sys.setdefaultencoding("UTF-8")
; it may corrupt your data silently and/or break 3rd-party modules that do not expect it. Remembersys.getdefaultencoding()
is used to convert bytestrings (str
) to/fromunicode
in Python 2 implicitly e.g.,"a" + u"b"
. See also, the quote in @mesilliac's answer.This is how I do it:
Note the
-S
in the bangline. That tells Python to not automatically import thesite
module. Thesite
module is what sets the default encoding and the removes the method so it can't be set again. But will honor what is already set.It seems accomplishing this is not recommended.
Fedora suggested using the system locale as the default, but apparently this breaks other things.
Here's a quote from the mailing-list discussion: