I'm using this code to get standard output from an external program:
>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
The communicate() method returns an array of bytes:
>>> command_stdout
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2\n'
However, I'd like to work with the output as a normal Python string. So that I could print it like this:
>>> print(command_stdout)
-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2
I thought that's what the binascii.b2a_qp() method is for, but when I tried it, I got the same byte array again:
>>> binascii.b2a_qp(command_stdout)
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2\n'
Does anybody know how to convert the bytes value back to string? I mean, using the "batteries" instead of doing it manually. And I'd like it to be ok with Python 3.
I think this way is easy:
I made a function to clean a list
If you don't know the encoding, then to read binary input into string in Python 3 and Python 2 compatible way, use ancient MS-DOS cp437 encoding:
Because encoding is unknown, expect non-English symbols to translate to characters of
cp437
(English chars are not translated, because they match in most single byte encodings and UTF-8).Decoding arbitrary binary input to UTF-8 is unsafe, because you may get this:
The same applies to
latin-1
, which was popular (default?) for Python 2. See the missing points in Codepage Layout - it is where Python chokes with infamousordinal not in range
.UPDATE 20150604: There are rumors that Python 3 has
surrogateescape
error strategy for encoding stuff into binary data without data loss and crashes, but it needs conversion tests[binary] -> [str] -> [binary]
to validate both performance and reliability.UPDATE 20170116: Thanks to comment by Nearoo - there is also a possibility to slash escape all unknown bytes with
backslashreplace
error handler. That works only for Python 3, so even with this workaround you will still get inconsistent output from different Python versions:See https://docs.python.org/3/howto/unicode.html#python-s-unicode-support for details.
UPDATE 20170119: I decided to implement slash escaping decode that works for both Python 2 and Python 3. It should be slower that
cp437
solution, but it should produce identical results on every Python version.You need to decode the bytes object to produce a string:
You need to decode the byte string and turn it in to a character (unicode) string.
or