Default encoding for python for stderr?

2019-04-26 18:48发布

问题:

I've got a noisy python script that I want to silence by directing its stderr output to /dev/null (using bash BTW).

Like so:

python -u parse.py  1> /tmp/output3.txt 2> /dev/null

but it quickly exits prematurely. Hmm. I can't see the traceback because of course that goes out with stderr. It runs noisily and normally if I don't direct stderr somewhere.

So let's try redirecting it to a file somewhere rather than /dev/null, and take a look at what it's outputting:

python -u parse.py  1> /tmp/output3.txt 2> /tmp/foo || tail /tmp/foo

Traceback (most recent call last):
  File "parse.py", line 79, in <module>
    parseit('pages-articles.xml')
  File "parse.py", line 33, in parseit
    print >>sys.stderr, "bad page title", page_title
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

So, the stderr that's being generated contains utf8, and for some reason python refuses to print non-ascii when it's being redirected, even though it's being directed to /dev/null (though of course python doesn't know that).

How can I silence the stderr of a python script even though it contains utf8? Is there any way to do it without re-writing every print to stderr in this script?

回答1:

You can silence stderr by binding it to a custom writer:

#!/usr/bin/env python
import codecs, sys

class NullWriter:
    def write(self, *args, **kwargs):
        pass

if len(sys.argv) == 2:
   if sys.argv[1] == '1':
      sys.stderr = NullWriter()
   elif sys.argv[1] == '2':
      #NOTE: sys.stderr.encoding is *read-only* 
      #      therefore the whole stderr should be replaced
      # encode all output using 'utf8'
      sys.stderr = codecs.getwriter('utf8')(sys.stderr)

print >>sys.stderr, u"\u20AC" # euro sign
print "ok"

Example:

$ python silence_stderr.py
Traceback (most recent call last):
  File "silence_stderr.py", line 11, in <module>
    print >>sys.stderr, u"\u20AC"
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)

Silenced stderr:

$ python silence_stderr.py 1
ok

Encoded stderr:

$ python silence_stderr.py 2
€
ok

NOTE: I've got the above outputs inside emacs therefore to emulate it in a terminal you could do:

$ python ... 2>out.txt
$ cat out.txt

NOTE: Inside Windows console (after chcp 65001 that switch to 'utf-8' and with truetype font (Lucida Console)) I've got strange results:

C:\> python silence_stderr.py 2
Traceback (most recent call last):
  File "silence_stderr.py", line 14, in <module>
    print >>sys.stderr, u"\u20AC" # euro sign
  File "C:\pythonxy\python\lib\codecs.py", line 304, in write
    self.stream.write(data)
IOError: [Errno 13] Permission denied

If the font is not truetype then the exception doesn't raise but the output is wrong.

Perl works for the truetype font:

C:\> perl  -E"say qq(\x{20ac})"
Wide character in print at -e line 1.
€

Redirection works though:

C:\>python silence_stderr.py 2 2>tmp.log
ok
C:\>cat tmp.log
€
cat: write error: Permission denied

re comment

From codecs.getwriter documentation:

Look up the codec for the given encoding and return its StreamWriter class or factory function. Raises a LookupError in case the encoding cannot be found.

An oversimplified view:

class UTF8StreamWriter:
    def __init__(self, writer):
        self.writer = writer
    def write(self, s):
        self.writer.write(s.encode('utf-8'))

sys.stderr = UTF8StreamWriter(sys.stderr)


回答2:

When stderr is not redirected, it takes on the encoding of your terminal. This all goes out the door when you redirect it though. You'll need to use sys.stderr.isatty() in order to detect if it's redirected and encode appropriately.



回答3:

You could also just encode the string as ASCII, replacing unicode characters that don't map. Then you don't have to worry about what kind of terminal you have.

asciiTitle = page_title.encode("ascii", "backslashreplace")
print >>sys.stderr, "bad page title", asciiTitle

That replaces the characters that can't be encoded with backslash-escapes, i.e. \xfc. There are some other replace options too, described here:

http://docs.python.org/library/stdtypes.html#str.encode