Alternative in python to subprocess

2020-06-18 05:27发布

问题:

I am trying to write a script which has to make a lot of calls to some bash commands, parse and process the outputs and finally give some output.

I was using subprocess.Popen and subprocess.call

If I understand correct these methods spawn a bah process, run the command, get the output and then kill the process.

Is there a way to have a bash process running in the background continuously and then the python calls could just go directly to that process? This would be something like bash running as a server and python calls going to it.

I feel this would optimize the calls a bit as there is no bash process setup and teardown. Or will it give no performance advantage?

回答1:

If I understand correct these methods spawn a bah process, run the command, get the output and then kill the process.

subprocess.Popen is a bit more involved. It actually creates an I/O thread to avoid deadlocks. See https://www.python.org/dev/peps/pep-0324/:

A communicate() method, which makes it easy to send stdin data and read stdout and stderr data, without risking deadlocks. Most people are aware of the flow control issues involved with child process communication, but not all have the patience or skills to write a fully correct and deadlock-free select loop. This means that many Python applications contain race conditions. A communicate() method in the standard library solves this problem.


Is there a way to have a bash process running in the background continuously and then the python calls could just go directly to that process?

Sure, you can still use subprocess.Popen and send messages to you subprocess and receive messages back without terminating the subprocess. In the simplest case your messages can be lines.

This allows for request-response style protocols as well as publish-subscribe when the subprocess can keep sending you messages back when an event of interest happens.



回答2:

I feel this would optimize the calls a bit as there is no bash process setup and teardown.

subprocess never runs the shell unless you ask it explicitly e.g.,

#!/usr/bin/env python
import subprocess

subprocess.check_call(['ls', '-l'])

This call runs ls program without invoking /bin/sh.

Or will it give no performance advantage?

If your subprocess calls actually use the shell e.g., to specify a pipeline consicely or you use bash process substitution that could be verbose and error-prone to define using subprocess module directly then it is unlikely that invoking bash is a performance bottleneck -- measure it first.

There are Python packages that too allow to specify such commands consicely e.g., plumbum could be used to emulate a shell pipeline.

If you want to use bash as a server process then pexpect is useful for dialog-based interactions with an external process -- though it is unlikely that it affects time performance. fabric allows to run both local and remote commands (ssh).

There are other subprocess wrappers such as sarge which can parse a pipeline specified in a string without invoking the shell e.g., it enables cross-platform support for bash-like syntax (&&, ||, & in command lines) or sh -- a complete subprocess replacement on Unix that provides TTY by default (it seems full-featured but the shell-like piping is less straightforward). You can even use Python-ish BASHwards-looking syntax to run commands with xonsh shell. Again, it is unlikely that it affects performance in a meaningful way in most cases.

The problem of starting and communicating with external processes in a portable manner is complex -- the interaction between processes, pipes, ttys, signals, threading, async. IO, buffering in various places has rough edges. Introducing a new package may complicate things if you don't know how a specific package solve numerous issues related to running shell commands.