I am writing a large amount of data to stdin.
How do i ensure that it is not blocking?
p=subprocess.Popen([path],stdout=subprocess.PIPE,stdin=subprocess.PIPE)
p.stdin.write('A very very very large amount of data')
p.stdin.flush()
output = p.stdout.readline()
It seems to hang at p.stdin.write()
after i read a large string and write to it.
I have a large corpus of files which will be written to stdin sequentially(>1k files)
So what happens is that i am running a loop
#this loop is repeated for all the files
for stri in lines:
p=subprocess.Popen([path],stdout=subprocess.PIPE,stdin=subprocess.PIPE)
p.stdin.write(stri)
output = p.stdout.readline()
#do some processing
It somehow hangs at file no. 400. The file is a large file with long strings.
I do suspect its a blocking issue.
This only happens if i iterate from 0 to 1000. However, if i were to start from file 400, the error would not happen
You may have to use
Popen.communicate()
.If you write a large amount of data to the stdin and during this the child process generates output to stdout then it may become a problem that the stdout buffer of the child becomes full before processing all of your stdin data. The child process blocks on a write to stdout (because you are not reading it) and you are blocked on writing the stdin.
Popen.communicate()
can be used to write stdin and read stdout/stderr at the same time to avoid the previous problem.Note:
Popen.communicate()
is suitable only when the input and output data can fit to your memory (they are not too large).Update: If you decide to hack around with threads here is an example parent and child process implementation that you can tailor to suit your needs:
parent.py:
child.py:
Note:
IOError
is processed on the reader/writer threads to handle the cases where the child process exits/crashes/killed.To avoid the deadlock in a portable way, write to the child in a separate thread:
See Python: read streaming input from subprocess.communicate()