I'm kind of struggling to understand what is the python way of solving this simple problem.
My problem is quite simple. If you use the follwing code it will hang. This is well documented in the subprocess module doc.
import subprocess
proc = subprocess.Popen(['cat','-'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
for i in range(100000):
proc.stdin.write('%d\n' % i)
output = proc.communicate()[0]
print output
Searching for a solution (there is a very insightful thread, but I've lost it now) I found this solution (among others) that uses an explicit fork:
import os
import sys
from subprocess import Popen, PIPE
def produce(to_sed):
for i in range(100000):
to_sed.write("%d\n" % i)
to_sed.flush()
#this would happen implicitly, anyway, but is here for the example
to_sed.close()
def consume(from_sed):
while 1:
res = from_sed.readline()
if not res:
sys.exit(0)
#sys.exit(proc.poll())
print 'received: ', [res]
def main():
proc = Popen(['cat','-'],stdin=PIPE,stdout=PIPE)
to_sed = proc.stdin
from_sed = proc.stdout
pid = os.fork()
if pid == 0 :
from_sed.close()
produce(to_sed)
return
else :
to_sed.close()
consume(from_sed)
if __name__ == '__main__':
main()
While this solution is conceptually very easy to understand, it uses one more process and stuck as too low level compared to the subprocess module (that is there just to hide this kind of things...).
I'm wondering: is there a simple and clean solution using the subprocess module that won't hung or to implement this patter I have to do a step back and implement an old-style select loop or an explicit fork?
Thanks
Your code deadlocks as soon as
cat
's stdout OS pipe buffer is full. If you usestdout=PIPE
; you have to consume it in time otherwise the deadlock as in your case may happen.If you don't need the output while the process is running; you could redirect it to a temporary file:
If the input/output are small (fit in memory); you could pass the input all at once and get the output all at once using
.communicate()
that reads/writes concurrently for you:To read/write concurrently manually, you could use threads, asyncio, fcntl, etc. @Jed provided a simple thread-based solution. Here's
asyncio
-based solution:On Unix, you could use
fcntl
-based solution:Here's something I used to load 6G mysql dump file loads via subprocess. Stay away from shell=True. Not secure and start out of process wasting resources.
Here is an example (Python 3) of reading one record at a time from gzip using a pipe:
I know there is a standard module for that, it is just meant as an example. You can read the whole output in one go (like shell back-ticks) using the communicate method, but obviously you hav eto be careful of memory size.
Here is an example (Python 3 again) of writing records to the lp(1) program on Linux:
If you don't want to keep all the data in memory, you have to use select. E.g. something like:
For this kind of thing, the shell works a lot better than subprocess.
Write very simple Python apps which read from
sys.stdin
and write tosys.stdout
.Connect the simple apps together using a shell pipeline.
If you want, start the pipeline using
subprocess
or simply write a one-line shell script.This is very, very efficient. It's also portable to all Linux (and Windows) as long as you keep it very simple.
The simplest solution I can think of:
Buffered version: