I'm using Windows and Linux machines for the same project. The default encoding for stdin on Windows is cp1252, and on Linux it is utf-8.
I would like to change everything to utf-8.
Is it possible? How can I do it?
This question is about Python 2; for Python 3, see Python 3: How to specify stdin encoding
You can do this by not relying on the implicit encoding when printing things. Not relying on that is a good idea in any case -- the implicit encoding is only used when printing to stdout and when stdout is connected to a terminal.
A better approach is to use unicode
everywhere, and use codecs.open
or codecs.getwriter
everywhere. You wrap sys.stdout
in an object that automatically encodes your unicode strings into UTF-8 using, for example:
sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
This will only work if you use unicode everywhere, though. So, use unicode everywhere. Really, everywhere.
This is an old question, but just for reference.
To read UTF-8
from stdin
, use:
UTF8Reader = codecs.getreader('utf8')
sys.stdin = UTF8Reader(sys.stdin)
# Then, e.g.:
for _ in sys.stdin:
print _.strip()
To write UTF-8
to stdout
, use:
UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)
# Then, e.g.:
print 'Anything'
Python automatically detects the encoding of stdin. The simplest way I have found to specify an encoding when automatic detection isn't working properly is to use the PYTHONIOENCODING environment variable, as in the following example:
pipeline | PYTHONIOENCODING="UTF-8" /path/to/your-script.py
For more information about encoding detection and this variable on different platforms you can look at the sys.stdin documentation.
A simple code snippet I used, which works for me on ubuntu: python2.7 and python3.6
from sys import version_info
if version_info.major == 2: # for python2
import codecs
# for stdin
UTF8Reader = codecs.getreader('utf8')
sys.stdin = UTF8Reader(sys.stdin)
# for stdout
UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)
elif version_info.major == 3: # for python3
import codecs
# for stdin
UTF8Reader = codecs.getreader('utf8')
sys.stdin = UTF8Reader(sys.stdin.buffer)
# for stdout
UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout.buffer)