Python subprocess with two inputs

2019-08-27 00:05发布

问题:

I am a writing a Python program that needs to call an external program, hmm3align, which operates as follows from the command line:

hmm3align hmm_file fasta_file -o output_file

So normally, the program expects two input files and writes the results to a third file. My program actually has multiple cases where it is calling an external program, but this is the only case where the external program has two file inputs. My intention is to avoid writing and reading files to allow these external programs to communicate with one another; I would prefer to have all data stored as Python variables during the session and feed these variables to the external programs when needed.

At the point in the Python program where hmm3align needs to be called, I already have two Python variables, hmm_model and fasta_model, that contain the info that would normally be included in hmm_file and fasta_file, respectively. What I want to do is call hmm3align by passing it hmm_model and fasta_model via stdin (because I think that's the only way possible to feed them as inputs) and then capture the results from stdout into a third Python variable named align_results. To do this, I created a separate function that uses the subprocess module as follows:

def hmmalign(hmm_model,fasta):
     args = ["/clusterfs/oha/software/bin/hmm3align",
             "-", "-",
             "-o", "/dev/stdout"]
     process = subprocess.Popen(args, shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     return process.communicate(hmm_model,fasta)[0]

So as you can see, I am trying to send both variables via stdin. The two "-" in the args list are meant to capture these two variables; I have seen the "-" used in other examples but their purpose was not clear and I may be misunderstanding things.

Sure enough, I get the following error at the end of the Traceback:

TypeError: communicate() takes at most 2 arguments (3 given)

So I cannot pass two separate variables via stdin to the program. I should mention that I have been able to make subprocess work on a similar external program when that program needed only one input file.

How do I make this work? Is it possible to use subprocess with more than one input? I have looked at the documentation and haven't seen this question answered. Thanks in advance.

回答1:

Standard input is a single data stream; on Unix it is a file descriptor connected to the output end of a unidirectional pipe. By convention, programs that read from a single file specified on the command line will understand - as an instruction to read from stdin instead of from a file. However, for a program that reads from two files there is no way to read from stdin twice as it is a single stream of data.

There are other file descriptors that can be used for communication (stdin is fd 0, stdout is fd 1, stderr is fd 2) but there is no conventional way to specify them instead of files.

The solution that is most likely to work here is named pipes (FIFOs); in Python, use os.mkfifo to create a named pipe and os.unlink to delete it. You can then pass its name to the program (it will appear as a file that can be read from) while writing to it (using open).