Using POpen to send a variable to Stdin and to sen

2019-08-26 01:21发布

问题:

In shell script, we have the following command:

/script1.pl < input_file| /script2.pl > output_file

I would like to replicate the above stream in Python using the module subprocess. input_file is a large file, and I can't read the whole file at once. As such I would like to pass each line, an input_string into the pipe stream and return a string variable output_string, until the whole file has been streamed through.

The following is a first attempt:

process = subprocess.Popen(["/script1.pl | /script2.pl"], stdin = subprocess.PIPE, stdout = subprocess.PIPE, shell = True)
process.stdin.write(input_string)
output_string = process.communicate()[0]

However, using process.communicate()[0] closes the stream. I would like to keep the stream open for future streams. I have tried using process.stdout.readline(), instead, but the program hangs.

回答1:

To emulate /script1.pl < input_file | /script2.pl > output_file shell command using subprocess module in Python:

#!/usr/bin/env python
from subprocess import check_call

with open('input_file', 'rb') as input_file
    with open('output_file', 'wb') as output_file:
        check_call("/script1.pl | /script2.pl", shell=True,
                   stdin=input_file, stdout=output_file)

You could write it without shell=True (though I don't see a reason here) based on 17.1.4.2. Replacing shell pipeline example from the docs:

#!/usr/bin/env python
from subprocess import Popen, PIPE

with open('input_file', 'rb') as input_file
    script1 = Popen("/script1.pl", stdin=input_file, stdout=PIPE)
with open("output_file", "wb") as output_file:
    script2 = Popen("/script2.pl", stdin=script1.stdout, stdout=output_file)
script1.stdout.close() # allow script1 to receive SIGPIPE if script2 exits
script2.wait()
script1.wait()

You could also use plumbum module to get shell-like syntax in Python:

#!/usr/bin/env python
from plumbum import local

script1, script2 = local["/script1.pl"], local["/script2.pl"]
(script1 < "input_file" | script2 > "output_file")()

See also How do I use subprocess.Popen to connect multiple processes by pipes?


If you want to read/write line by line then the answer depends on the concrete scripts that you want to run. In general it is easy to deadlock sending/receiving input/output if you are not careful e.g., due to buffering issues.

If input doesn't depend on output in your case then a reliable cross-platform approach is to use a separate thread for each stream:

#!/usr/bin/env python
from subprocess import Popen, PIPE
from threading import Thread

def pump_input(pipe):
    try:
       for i in xrange(1000000000): # generate large input
           print >>pipe, i
    finally:
       pipe.close()

p = Popen("/script1.pl | /script2.pl", shell=True, stdin=PIPE, stdout=PIPE,
          bufsize=1)
Thread(target=pump_input, args=[p.stdin]).start()
try: # read output line by line as soon as the child flushes its stdout buffer
    for line in iter(p.stdout.readline, b''):
        print line.strip()[::-1] # print reversed lines
finally:
    p.stdout.close()
    p.wait()