How to read inputs from stdin and enforce an encod

2019-07-15 06:15发布

The goal is to continuously read from stdin and enforce utf8 in both Python2 and Python3.

I've tried solutions from:

I've tried:

#!/usr/bin/env python

from __future__ import print_function, unicode_literals
import io
import sys

# Supports Python2 read from stdin and Python3 read from stdin.buffer
# https://stackoverflow.com/a/23932488/610569
user_input = getattr(sys.stdin, 'buffer', sys.stdin)


# Enforcing utf-8 in Python3
# https://stackoverflow.com/a/16549381/610569
with io.TextIOWrapper(user_input, encoding='utf-8') as fin:
    for line in fin:
        # Reads the input line by line
        # and do something, for e.g. just print line.
        print(line)

The code works in Python3 but in Python2, the TextIOWrapper doesn't have a read function and it throws:

Traceback (most recent call last):
  File "testfin.py", line 12, in <module>
    with io.TextIOWrapper(user_input, encoding='utf-8') as fin:
AttributeError: 'file' object has no attribute 'readable'

That's because in Python the user_input , i.e. sys.stdin.buffer is an _io.BufferedReader object and its attribute has readable:

<class '_io.BufferedReader'>

['__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', '_dealloc_warn', '_finalizing', 'close', 'closed', 'detach', 'fileno', 'flush', 'isatty', 'mode', 'name', 'peek', 'raw', 'read', 'read1', 'readable', 'readinto', 'readinto1', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']

While in Python2 the user_input is a file object and its attributes don't have readable:

<type 'file'>

['__class__', '__delattr__', '__doc__', '__enter__', '__exit__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'closed', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'mode', 'name', 'newlines', 'next', 'read', 'readinto', 'readline', 'readlines', 'seek', 'softspace', 'tell', 'truncate', 'write', 'writelines', 'xreadlines']

2条回答
戒情不戒烟
2楼-- · 2019-07-15 06:52

If you don't need a fully-fledged io.TextIOWrapper, but just a decoded stream for reading, you can use codecs.getreader() to create a decoding wrapper:

reader = codecs.getreader('utf8')(user_input)
for line in reader:
    # do whatever you need...
    print(line)

codecs.getreader('utf8') creates a factory for a codecs.StreamReader, which is then instantiated using the original stream. I'm not sure the StreamReader supports the with context, but this might not be strictly necessary (there's no need to close STDIN after reading, I guess...).

I've successfully used this solution in situations where the underlying stream only offers a very limited interface.

Update (2nd version)

From the comments, it became clear that you actually need an io.TextIOWrapper to have proper line buffering etc. in interactive mode; codecs.StreamReader only works for piped input and the like.

Using this answer, I was able to get interactive input work properly:

#!/usr/bin/env python
# coding: utf8

from __future__ import print_function, unicode_literals
import io
import sys

user_input = getattr(sys.stdin, 'buffer', sys.stdin)

with io.open(user_input.fileno(), encoding='utf8') as f:
    for line in f:
        # do whatever you need...
        print(line)

This creates an io.TextIOWrapper with enforced encoding from the binary STDIN buffer.

查看更多
虎瘦雄心在
3楼-- · 2019-07-15 06:57

Have you tried forcing utf-8 encoding in python as follow :

import sys
reload(sys)
sys.setdefaultencoding('utf-8')
查看更多
登录 后发表回答