I have a large file that needs to be processed before feeding to another command. I could save the processed data as a temporary file but would like to avoid it. I wrote a generator that processes each line at a time then following script to feed to the external command as input. however I got "I/O operation on closed file" exception at the second round of the loop:
cmd = ['intersectBed', '-a', 'stdin', '-b', bedfile]
p = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
for entry in my_entry_generator: # <- this is my generator
output = p.communicate(input='\t'.join(entry) + '\n')[0]
print output
I read another similar question that uses p.stdin.write. but still had the same problem.
What I did wrong?
[edit] I replaced last two statements with following (thanks SpliFF):
output = p.communicate(input='\t'.join(entry) + '\n')
if output[1]: print "error:", output[1]
else: print output[0]
to see if there was any error by the external program. But no. Still have the same exception at p.communicate line.
Probably your intersectBed application is exiting with an error but since you aren't printing any stderr data you can't see it. Try:
The
communicate
method ofsubprocess.Popen
objects can only be called once. What it does is it sends the input you give it to the process while reading all the stdout and stderr output. And by "all", I mean it waits for the process to exit so that it knows it has all output. Oncecommunicate
returns, the process no longer exists.If you want to use
communicate
, you have to either restart the process in the loop, or give it a single string that is all the input from the generator. If you want to do streaming communication, sending data bit by bit, then you have to not usecommunicate
. Instead, you would need to write top.stdin
while reading fromp.stdout
andp.stderr
. Doing this is tricky, because you can't tell which output is caused by which input, and because you can easily run into deadlocks. There are third-party libraries that can help you with this, like Twisted.If you want to do this interactively, sending some data and then waiting for and processing the result before sending more data, things get even harder. You should probably use a third-party library like
pexpect
for that.Of course, if you can get away with just starting the process inside the loop, that would be a lot easier: