I've got a noisy python script that I want to silence by directing its stderr output to /dev/null (using bash BTW).
Like so:
python -u parse.py 1> /tmp/output3.txt 2> /dev/null
but it quickly exits prematurely. Hmm. I can't see the traceback because of course that goes out with stderr. It runs noisily and normally if I don't direct stderr somewhere.
So let's try redirecting it to a file somewhere rather than /dev/null, and take a look at what it's outputting:
python -u parse.py 1> /tmp/output3.txt 2> /tmp/foo || tail /tmp/foo
Traceback (most recent call last):
File "parse.py", line 79, in <module>
parseit('pages-articles.xml')
File "parse.py", line 33, in parseit
print >>sys.stderr, "bad page title", page_title
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
So, the stderr that's being generated contains utf8, and for some reason python refuses to print non-ascii when it's being redirected, even though it's being directed to /dev/null (though of course python doesn't know that).
How can I silence the stderr of a python script even though it contains utf8? Is there any way to do it without re-writing every print to stderr in this script?
You can silence stderr by binding it to a custom writer:
#!/usr/bin/env python
import codecs, sys
class NullWriter:
def write(self, *args, **kwargs):
pass
if len(sys.argv) == 2:
if sys.argv[1] == '1':
sys.stderr = NullWriter()
elif sys.argv[1] == '2':
#NOTE: sys.stderr.encoding is *read-only*
# therefore the whole stderr should be replaced
# encode all output using 'utf8'
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
print >>sys.stderr, u"\u20AC" # euro sign
print "ok"
Example:
$ python silence_stderr.py
Traceback (most recent call last):
File "silence_stderr.py", line 11, in <module>
print >>sys.stderr, u"\u20AC"
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)
Silenced stderr:
$ python silence_stderr.py 1
ok
Encoded stderr:
$ python silence_stderr.py 2
€
ok
NOTE: I've got the above outputs inside emacs therefore to emulate it in a terminal you could do:
$ python ... 2>out.txt
$ cat out.txt
NOTE: Inside Windows console (after chcp 65001
that switch to 'utf-8' and with truetype font (Lucida Console
)) I've got strange results:
C:\> python silence_stderr.py 2
Traceback (most recent call last):
File "silence_stderr.py", line 14, in <module>
print >>sys.stderr, u"\u20AC" # euro sign
File "C:\pythonxy\python\lib\codecs.py", line 304, in write
self.stream.write(data)
IOError: [Errno 13] Permission denied
If the font is not truetype then the exception doesn't raise but the output is wrong.
Perl works for the truetype font:
C:\> perl -E"say qq(\x{20ac})"
Wide character in print at -e line 1.
€
Redirection works though:
C:\>python silence_stderr.py 2 2>tmp.log
ok
C:\>cat tmp.log
€
cat: write error: Permission denied
re comment
From codecs.getwriter
documentation:
Look up the codec for the given
encoding and return its StreamWriter
class or factory function. Raises a
LookupError
in case the encoding
cannot be found.
An oversimplified view:
class UTF8StreamWriter:
def __init__(self, writer):
self.writer = writer
def write(self, s):
self.writer.write(s.encode('utf-8'))
sys.stderr = UTF8StreamWriter(sys.stderr)
When stderr is not redirected, it takes on the encoding of your terminal. This all goes out the door when you redirect it though. You'll need to use sys.stderr.isatty() in order to detect if it's redirected and encode appropriately.
You could also just encode the string as ASCII, replacing unicode characters that don't map. Then you don't have to worry about what kind of terminal you have.
asciiTitle = page_title.encode("ascii", "backslashreplace")
print >>sys.stderr, "bad page title", asciiTitle
That replaces the characters that can't be encoded with backslash-escapes, i.e. \xfc
. There are some other replace options too, described here:
http://docs.python.org/library/stdtypes.html#str.encode