I am trying to write a script which has to make a lot of calls to some bash commands, parse and process the outputs and finally give some output.
I was using subprocess.Popen and subprocess.call
If I understand correct these methods spawn a bah process, run the command, get the output and then kill the process.
Is there a way to have a bash process running in the background continuously and then the python calls could just go directly to that process? This would be something like bash running as a server and python calls going to it.
I feel this would optimize the calls a bit as there is no bash process setup and teardown. Or will it give no performance advantage?
If I understand correct these methods spawn a bah process, run the command, get the output and then kill the process.
subprocess.Popen
is a bit more involved. It actually creates an I/O thread to avoid deadlocks. See https://www.python.org/dev/peps/pep-0324/:
A communicate()
method, which makes it easy to send stdin
data and read stdout
and stderr
data, without risking deadlocks. Most people are aware of the flow control issues involved with child process communication, but not all have the patience or skills to write a fully correct and deadlock-free select loop. This means that many Python applications contain race conditions. A communicate()
method in the standard library solves this problem.
Is there a way to have a bash process running in the background continuously and then the python calls could just go directly to that process?
Sure, you can still use subprocess.Popen
and send messages to you subprocess and receive messages back without terminating the subprocess. In the simplest case your messages can be lines.
This allows for request-response style protocols as well as publish-subscribe when the subprocess can keep sending you messages back when an event of interest happens.
I feel this would optimize the calls a bit as there is no bash process setup and teardown.
subprocess
never runs the shell unless you ask it explicitly e.g.,
#!/usr/bin/env python
import subprocess
subprocess.check_call(['ls', '-l'])
This call runs ls
program without invoking /bin/sh
.
Or will it give no performance advantage?
If your subprocess calls actually use the shell e.g., to specify a pipeline consicely or you use bash process substitution that could be verbose and error-prone to define using subprocess
module directly then it is unlikely that invoking bash
is a performance bottleneck -- measure it first.
There are Python packages that too allow to specify such commands consicely e.g., plumbum
could be used to emulate a shell pipeline.
If you want to use bash
as a server process then pexpect
is useful for dialog-based interactions with an external process -- though it is unlikely that it affects time performance. fabric
allows to run both local and remote commands (ssh
).
There are other subprocess wrappers such as sarge
which can parse a pipeline specified in a string without invoking the shell e.g., it enables cross-platform support for bash-like syntax (&&
, ||
, &
in command lines) or sh
-- a complete subprocess
replacement on Unix that provides TTY by default (it seems full-featured but the shell-like piping is less straightforward). You can even use Python-ish BASHwards-looking syntax to run commands with xonsh
shell.
Again, it is unlikely that it affects performance in a meaningful way in most cases.
The problem of starting and communicating with external processes in a portable manner is complex -- the interaction between processes, pipes, ttys, signals, threading, async. IO, buffering in various places has rough edges. Introducing a new package may complicate things if you don't know how a specific package solve numerous issues related to running shell commands.