Hi I'm trying to call the following command from python:
comm -3 <(awk '{print $1}' File1.txt | sort | uniq) <(awk '{print $1}' File2.txt | sort | uniq) | grep -v "#" | sed "s/\t//g"
How could I do the calling when the inputs for the comm command are also piped?
Is there an easy and straight forward way to do it?
I tried the subprocess module:
subprocess.call("comm -3 <(awk '{print $1}' File1.txt | sort | uniq) <(awk '{print $1}' File2.txt | sort | uniq) | grep -v '#' | sed 's/\t//g'")
Without success, it says: OSError: [Errno 2] No such file or directory
Or do I have to create the different calls individually and then pass them using PIPE as it is described in the subprocess documentation:
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
Also checkout plumbum. Makes life easier
http://plumbum.readthedocs.io/en/latest/
Pipelining
This may be wrong, but you can try this:
Let me know if this goes wrong, Will try to fix it.
Edit: As pointed out, I missed the substitution thing, and I think it would have to be explicitly done by redirecting the above command output to a temporary file and then using that file in the argument to comm.
So the above would now actually become:
Also, alternatively you can use the method described by @charles by making use of mkfifo.
Process substitution (
<()
) is bash-only functionality. Thus, you need a shell, but it can't be just any shell (like/bin/sh
, as used byshell=True
on non-Windows platforms) -- it needs to be bash.By the way, if you're going to be going this route with arbitrary filenames, pass them out-of-band (as below: Passing
_
as$0
,File1.txt
as$1
, andFile2.txt
as$2
):That said, the best-practices approach is indeed to set up the chain yourself. The below is tested with Python 3.6 (note the need for the
pass_fds
argument tosubprocess.Popen
to make the file descriptors referred to via/dev/fd/##
links available):This is a lot more code, but (assuming that the filenames are parameterized in the real world) it's also safer code -- you aren't vulnerable to bugs like ShellShock that are triggered by the simple act of starting a shell, and don't need to worry about passing variables out-of-band to avoid injection attacks (except in the context of arguments to commands -- like
awk
-- that are scripting language interpreters themselves).That said, another thing to think about is just implementing the whole thing in native Python.