On my local machine I run a python script which contains this line
bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
os.system(bashCommand)
This works fine.
Then I run the same code on a server and I get the following error message
'import site' failed; use -v for traceback
Traceback (most recent call last):
File "/usr/bin/cwm", line 48, in <module>
from swap import diag
ImportError: No module named swap
So what I did then is I inserted a "print bashCommand" which prints me then the command in the terminal before it runs it with os.system().
Of course I get again the error (caused by os.system(bashCommand)) but before that error it prints the command in the terminal. Then I just copied that output and did a copy paste into the terminal and hit enter and it works...
Does anyone have a clue what's going on :(?
To somewhat expand on the earlier answers here, there are a number of details which are commonly overlooked.
subprocess.run()
oversubprocess.check_call()
and friends oversubprocess.call()
oversubprocess.Popen()
overos.system()
overos.popen()
text=True
, akauniversal_newlines=True
.shell=True
orshell=False
and how it changes quoting and the availability of shell conveniences.sh
and BashThese topics are covered in some more detail below.
Prefer
subprocess.run()
orsubprocess.check_call()
The
subprocess.Popen()
function is a low-level workhorse but it is tricky to use correctly and you end up copy/pasting multiple lines of code ... which conveniently already exist in the standard library as a set of higher-level wrapper functions for various purposes, which are presented in more detail in the following.Here's a paragraph from the documentation:
Unfortunately, the availability of these wrapper functions differs between Python versions.
subprocess.run()
was officially introduced in Python 3.5. It is meant to replace all of the following.subprocess.check_output()
was introduced in Python 2.7 / 3.1. It is basically equivalent tosubprocess.run(..., check=True, stdout=subprocess.PIPE).stdout
subprocess.check_call()
was introduced in Python 2.5. It is basically equivalent tosubprocess.run(..., check=True)
subprocess.call()
was introduced in Python 2.4 in the originalsubprocess
module (PEP-324). It is basically equivalent tosubprocess.run(...).returncode
High-level API vs
subprocess.Popen()
The refactored and extended
subprocess.run()
is more logical and more versatile than the older legacy functions it replaces. It returns aCompletedProcess
object which has various methods which allow you to retrieve the exit status, the standard output, and a few other results and status indicators from the finished subprocess.subprocess.run()
is the way to go if you simply need a program to run and return control to Python. For more involved scenarios (background processes, perhaps with interactive I/O with the Python parent program) you still need to usesubprocess.Popen()
and take care of all the plumbing yourself. This requires a fairly intricate understanding of all the moving parts and should not be undertaken lightly. The simplerPopen
object represents the (possibly still-running) process which needs to be managed from your code for the remainder of the lifetime of the subprocess.It should perhaps be emphasized that just
subprocess.Popen()
merely creates a process. If you leave it at that, you have a subprocess running concurrently alongside with Python, so a "background" process. If it doesn't need to do input or output or otherwise coordinate with you, it can do useful work in parallel with your Python program.Avoid
os.system()
andos.popen()
Since time eternal (well, since Python 2.5) the
os
module documentation has contained the recommendation to prefersubprocess
overos.system()
:The problems with
system()
are that it's obviously system-dependent and doesn't offer ways to interact with the subprocess. It simply runs, with standard output and standard error outside of Python's reach. The only information Python receives back is the exit status of the command (zero means success, though the meaning of non-zero values is also somewhat system-dependent).PEP-324 (which was already mentioned above) contains a more detailed rationale for why
os.system
is problematic and howsubprocess
attempts to solve those issues.os.popen()
is even more strongly discouraged:Understand and usually use
check=True
You'll also notice that
subprocess.call()
has many of the same limitations asos.system()
. In regular use, you should generally check whether the process finished successfully, whichsubprocess.check_call()
andsubprocess.check_output()
do (where the latter also returns the standard output of the finished subprocess). Similarly, you should usually usecheck=True
withsubprocess.run()
unless you specifically need to allow the subprocess to return an error status.In practice, with
check=True
orsubprocess.check_*
, Python will throw aCalledProcessError
exception if the subprocess returns a nonzero exit status.A common error with
subprocess.run()
is to omitcheck=True
and be surprised when downstream code fails if the subprocess failed.On the other hand, a common problem with
check_call()
andcheck_output()
was that users who blindly used these functions were surprised when the exception was raised e.g. whengrep
did not find a match. (You should probably replacegrep
with native Python code anyway, as outlined below.)All things counted, you need to understand how shell commands return an exit code, and under what conditions they will return a non-zero (error) exit code, and make a conscious decision how exactly it should be handled.
Understand and probably use
text=True
akauniversal_newlines=True
Since Python 3, strings internal to Python are Unicode strings. But there is no guarantee that a subprocess generates Unicode output, or strings at all.
(If the differences are not immediately obvious, Ned Batchelder's Pragmatic Unicode is recommended, if not outright obligatory, reading. There is a 36-minute video presentation behind the link if you prefer, though reading the page yourself will probably take significantly less time.)
Deep down, Python has to fetch a
bytes
buffer and interpret it somehow. If it contains a blob of binary data, it shouldn't be decoded into a Unicode string, because that's error-prone and bug-inducing behavior - precisely the sort of pesky behavior which riddled many Python 2 scripts, before there was a way to properly distinguish between encoded text and binary data.With
text=True
, you tell Python that you, in fact, expect back textual data in the system's default encoding, and that it should be decoded into a Python (Unicode) string to the best of Python's ability (usually UTF-8 on any moderately up to date system, except perhaps Windows?)If that's not what you request back, Python will just give you
bytes
strings in thestdout
andstderr
strings. Maybe at some later point you do know that they were text strings after all, and you know their encoding. Then, you can decode them.Python 3.7 introduced the shorter and more descriptive and understandable alias
text
for the keyword argument which was previously somewhat misleadingly calleduniversal_newlines
.Understand
shell=True
vsshell=False
With
shell=True
you pass a single string to your shell, and the shell takes it from there.With
shell=False
you pass a list of arguments to the OS, bypassing the shell.When you don't have a shell, you save a process and get rid of a fairly substantial amount of hidden complexity, which may or may not harbor bugs or even security problems.
On the other hand, when you don't have a shell, you don't have redirection, wildcard expansion, job control, and a large number of other shell features.
A common mistake is to use
shell=True
and then still pass Python a list of tokens, or vice versa. This happens to work in some cases, but is really ill-defined and could break in interesting ways.The common retort "but it works for me" is not a useful rebuttal unless you understand exactly under what circumstances it could stop working.
Refactoring Example
Very often, the features of the shell can be replaced with native Python code. Simple Awk or
sed
scripts should probably simply be translated to Python instead.To partially illustrate this, here is a typical but slightly silly example which involves many shell features.
Some things to note here:
shell=False
you don't need the quoting that the shell requires around strings. Putting quotes anyway is probably an error.The refactored code also illustrates just how much the shell really does for you with a very terse syntax -- for better or for worse. Python says explicit is better than implicit but the Python code is rather verbose and arguably looks more complex than this really is. On the other hand, it offers a number of points where you can grab control in the middle of something else, as trivially exemplified by the enhancement that we can easily include the host name along with the shell command output. (This is by no means challenging to do in the shell, either, but at the expense of yet another diversion and perhaps another process.)
Common Shell Constructs
For completeness, here are brief explanations of some of these shell features, and some notes on how they can perhaps be replaced with native Python facilities.
glob.glob()
or very often with simple Python string comparisons likefor file in os.listdir('.'): if not file.endswith('.png'): continue
. Bash has various other expansion facilities like.{png,jpg}
brace expansion and{1..100}
as well as tilde expansion (~
expands to your home directory, and more generally~account
to the home directory of another user)grep 'foo' <inputfile >outputfile
opensoutputfile
for writing andinputfile
for reading, and passes its contents as standard input togrep
, whose standard output then lands inoutputfile
. This is not generally hard to replace with native Python code.echo foo | nl
runs two subprocesses, where the standard output ofecho
is the standard input ofnl
(on the OS level, in Unix-like systems, this is a single file handle). If you cannot replace one or both ends of the pipeline with native Python code, perhaps think about using a shell after all, especially if the pipeline has more than two or three processes (though look at thepipes
module in the Python standard library or a number of more modern and versatile third-party competitors).Understand differences between
sh
and Bashsubprocess
runs your shell commands with/bin/sh
unless you specifically request otherwise (except of course on Windows, where it uses the value of theCOMSPEC
variable). This means that various Bash-only features like arrays,[[
etc are not available.If you need to use Bash-only syntax, you can pass in the path to the shell as
executable='/bin/bash'
(where of course if your Bash is installed somewhere else, you need to adjust the path).A
subprocess
is separate from its parent, and cannot change itA somewhat common mistake is doing something like
which aside from the lack of elegance also betrays a fundamental lack of understanding of the "sub" part of the name "subprocess".
A child process runs completely separate from Python, and when it finishes, Python has no idea what it did (apart from the vague indicators that it can infer from the exit status and output from the child process). A child generally cannot change the parent's environment; it cannot set a variable, change the working directory, or, in so many words, communicate with its parent without cooperation from the parent.
The immediate fix in this particular case is to run both commands in a single subprocess;
though obviously this particular use case doesn't require the shell at all. Remember, you can manipulate the environment of the current process (and thus also its children) via
or pass an environment setting to a child process with
(not to mention the obvious refactoring
subprocess.run(['echo', 'bar'])
; butecho
is a poor example of something to run in a subprocess in the first place, of course).Don't run Python from Python
This is slightly dubious advice; there are certainly situations where it does make sense or is even an absolute requirement to run the Python interpreter as a subprocess from a Python script. But very frequently, the correct approach is simply to
import
the other Python module into your calling script and call its functions directly.If the other Python script is under your control, and it isn't a module, consider turning it into one. (This answer is too long already so I will not delve into details here.)
If you need parallelism, you can run Python functions in subprocesses with the
multiprocessing
module. There is alsothreading
which runs multiple tasks in a single process (which is more lightweight and gives you more control, but also more constrained in that threads within a process are tightly coupled, and bound to a single GIL.)To run the command without a shell, pass the command as a list and implement the redirection in Python using
[subprocess]
:Note: no
> test.nt
at the end.stdout=file
implements the redirection.To run the command using the shell in Python, pass the command as a string and enable
shell=True
:Here's the shell is responsible for the output redirection (
> test.nt
is in the command).To run a bash command that uses bashisms, specify the bash executable explicitly e.g., to emulate bash process substitution:
Call it with subprocess
The error you are getting seems to be because there is no swap module on the server, you should install swap on the server then run the script again
You can use 'subprocess', but I always felt that it was not a 'Pythonic' way of doing it. So I created Sultan (shameless plug) that makes it easy to run command line functions.
https://github.com/aeroxis/sultan
Don't use
os.system
. It has been deprecated in favor of subprocess. From the docs: "This module intends to replace several older modules and functions:os.system
,os.spawn
".Like in your case:
It is possible you use the bash program, with the parameter -c for execute the commands: