In shell script, we have the following command:
/script1.pl < input_file| /script2.pl > output_file
I would like to replicate the above stream in Python using the module subprocess
. input_file
is a large file, and I can't read the whole file at once. As such I would like to pass each line, an input_string
into the pipe stream and return a string variable output_string
, until the whole file has been streamed through.
The following is a first attempt:
process = subprocess.Popen(["/script1.pl | /script2.pl"], stdin = subprocess.PIPE, stdout = subprocess.PIPE, shell = True)
process.stdin.write(input_string)
output_string = process.communicate()[0]
However, using process.communicate()[0]
closes the stream. I would like to keep the stream open for future streams. I have tried using process.stdout.readline()
, instead, but the program hangs.
To emulate /script1.pl < input_file | /script2.pl > output_file
shell command using subprocess
module in Python:
#!/usr/bin/env python
from subprocess import check_call
with open('input_file', 'rb') as input_file
with open('output_file', 'wb') as output_file:
check_call("/script1.pl | /script2.pl", shell=True,
stdin=input_file, stdout=output_file)
You could write it without shell=True
(though I don't see a reason here) based on 17.1.4.2. Replacing shell pipeline example from the docs:
#!/usr/bin/env python
from subprocess import Popen, PIPE
with open('input_file', 'rb') as input_file
script1 = Popen("/script1.pl", stdin=input_file, stdout=PIPE)
with open("output_file", "wb") as output_file:
script2 = Popen("/script2.pl", stdin=script1.stdout, stdout=output_file)
script1.stdout.close() # allow script1 to receive SIGPIPE if script2 exits
script2.wait()
script1.wait()
You could also use plumbum
module to get shell-like syntax in Python:
#!/usr/bin/env python
from plumbum import local
script1, script2 = local["/script1.pl"], local["/script2.pl"]
(script1 < "input_file" | script2 > "output_file")()
See also How do I use subprocess.Popen to connect multiple processes by pipes?
If you want to read/write line by line then the answer depends on the concrete scripts that you want to run. In general it is easy to deadlock sending/receiving input/output if you are not careful e.g., due to buffering issues.
If input doesn't depend on output in your case then a reliable cross-platform approach is to use a separate thread for each stream:
#!/usr/bin/env python
from subprocess import Popen, PIPE
from threading import Thread
def pump_input(pipe):
try:
for i in xrange(1000000000): # generate large input
print >>pipe, i
finally:
pipe.close()
p = Popen("/script1.pl | /script2.pl", shell=True, stdin=PIPE, stdout=PIPE,
bufsize=1)
Thread(target=pump_input, args=[p.stdin]).start()
try: # read output line by line as soon as the child flushes its stdout buffer
for line in iter(p.stdout.readline, b''):
print line.strip()[::-1] # print reversed lines
finally:
p.stdout.close()
p.wait()