We're preparing to move to Python 3.4 and added unicode_literals. Our code relies extensively on piping to/from external utilities using subprocess module. The following code snippet works fine on Python 2.7 to pipe UTF-8 strings to a sub-process:
kw = {}
kw[u'stdin'] = subprocess.PIPE
kw[u'stdout'] = subprocess.PIPE
kw[u'stderr'] = subprocess.PIPE
kw[u'executable'] = u'/path/to/binary/utility'
args = [u'', u'-l', u'nl']
line = u'¡Basta Ya!'
popen = subprocess.Popen(args,**kw)
popen.stdin.write('%s\n' % line.encode(u'utf-8'))
...blah blah...
The following changes throw this error:
from __future__ import unicode_literals
kw = {}
kw[u'stdin'] = subprocess.PIPE
kw[u'stdout'] = subprocess.PIPE
kw[u'stderr'] = subprocess.PIPE
kw[u'executable'] = u'/path/to/binary/utility'
args = [u'', u'-l', u'nl']
line = u'¡Basta Ya!'
popen = subprocess.Popen(args,**kw)
popen.stdin.write('%s\n' % line.encode(u'utf-8'))
Traceback (most recent call last):
File "test.py", line 138, in <module>
exitcode = main()
File "test.py", line 57, in main
popen.stdin.write('%s\n' % line.encode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
Any suggestions to pass UTF-8 through the pipe?
'%s\n'
is a unicode string when you useunicode_literals
:What happens is that your encoded
line
value is being decoded to interpolate into the unicode'%s\n'
string.You'll have to use a byte string instead; prefix the string with
b
:or encode after interpolation:
In Python 3, you'll have to write bytestrings to pipes anyway.
If
utf-8
stands for your locale encoding then to communicate using Unicode strings, you could useuniversal_newlines=True
on Python 3:The code works even if the locale's encoding is not utf-8. Input/output are Unicode strings here (
str
type).If the subprocess requires
utf-8
whatever the current locale is then communicate using bytestrings instead (pass/read bytes):The code works the same on both Python 2 and 3.