I am trying to format the following awk command
awk -v OFS="\t" '{printf "chr%s\t%s\t%s\n", $1, $2-1, $2}' file1.txt > file2.txt
for use in python subprocess popen. However i am having a hard time formatting it. I have tried solutions suggested in similar answers but none of them worked. I have also tried using raw string literals. Also i would not like to use shell=True as this is not recommended
Edit according to comment:
The command i tried was
awk_command = """awk -v OFS="\t" '{printf "chr%s\t%s\t%s\n", $1, $2-1, $2}' file1.txt > file2.txt"""
command_execute = Popen(shlex.split(awk_command))
However i get the following error upon executing this
KeyError: 'printf "chr%s\t%s\t%s\n", $1, $2-1, $2'
googling the error suggests this happens when a value is requested for an undefined key but i do not understand its context here
The simplest method, especially if you wish to keep the output redirection stuff, is to use subprocess
with shell=True
- then you only need to escape Python special characters. The line, as a whole, will be interpreted by the default shell.
- WARNING: do not use this with untrusted input without sanitizing it first!
Alternatively, you can replace the command line with an argv
-type sequence and feed that to subprocess
instead. Then, you need to provide stuff as the program would see it:
- remove all the shell-level escaping
- remove the output redirection stuff and do the redirection yourself instead
Regarding the specific problems:
So, if you wish to use shell=False
, do construct the argument list yourself.
>
is the shell redirection operator. To implement it in Python, use stdout
parameter:
#!/usr/bin/env python
import shlex
import subprocess
cmd = r"""awk -v OFS="\t" '{printf "chr%s\t%s\t%s\n", $1, $2-1, $2}'"""
with open('file2.txt', 'wb', 0) as output_file:
subprocess.check_call(shlex.split(cmd) + ["file1.txt"], stdout=output_file)
To avoid starting a separate process, you could implement this particular awk
command in pure Python.