The goal is to continuously read from stdin
and enforce utf8
in both Python2 and Python3.
I've tried solutions from:
- Writing bytes to standard output in a way compatible with both, python2 and python3
- Python 3: How to specify stdin encoding
I've tried:
#!/usr/bin/env python
from __future__ import print_function, unicode_literals
import io
import sys
# Supports Python2 read from stdin and Python3 read from stdin.buffer
# https://stackoverflow.com/a/23932488/610569
user_input = getattr(sys.stdin, 'buffer', sys.stdin)
# Enforcing utf-8 in Python3
# https://stackoverflow.com/a/16549381/610569
with io.TextIOWrapper(user_input, encoding='utf-8') as fin:
for line in fin:
# Reads the input line by line
# and do something, for e.g. just print line.
print(line)
The code works in Python3 but in Python2, the TextIOWrapper doesn't have a read function and it throws:
Traceback (most recent call last):
File "testfin.py", line 12, in <module>
with io.TextIOWrapper(user_input, encoding='utf-8') as fin:
AttributeError: 'file' object has no attribute 'readable'
That's because in Python the user_input
, i.e. sys.stdin.buffer
is an
_io.BufferedReader
object and its attribute has readable
:
<class '_io.BufferedReader'>
['__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', '_dealloc_warn', '_finalizing', 'close', 'closed', 'detach', 'fileno', 'flush', 'isatty', 'mode', 'name', 'peek', 'raw', 'read', 'read1', 'readable', 'readinto', 'readinto1', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']
While in Python2 the user_input
is a file object and its attributes don't have readable
:
<type 'file'>
['__class__', '__delattr__', '__doc__', '__enter__', '__exit__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'closed', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'mode', 'name', 'newlines', 'next', 'read', 'readinto', 'readline', 'readlines', 'seek', 'softspace', 'tell', 'truncate', 'write', 'writelines', 'xreadlines']
If you don't need a fully-fledged
io.TextIOWrapper
, but just a decoded stream for reading, you can usecodecs.getreader()
to create a decoding wrapper:codecs.getreader('utf8')
creates a factory for acodecs.StreamReader
, which is then instantiated using the original stream. I'm not sure theStreamReader
supports thewith
context, but this might not be strictly necessary (there's no need to close STDIN after reading, I guess...).I've successfully used this solution in situations where the underlying stream only offers a very limited interface.
Update (2nd version)
From the comments, it became clear that you actually need an
io.TextIOWrapper
to have proper line buffering etc. in interactive mode;codecs.StreamReader
only works for piped input and the like.Using this answer, I was able to get interactive input work properly:
This creates an
io.TextIOWrapper
with enforced encoding from the binary STDIN buffer.Have you tried forcing utf-8 encoding in python as follow :