Formatting a command in python subprocess popen

2019-01-15 23:35发布

问题:

I am trying to format the following awk command

awk -v OFS="\t" '{printf "chr%s\t%s\t%s\n", $1, $2-1, $2}' file1.txt > file2.txt

for use in python subprocess popen. However i am having a hard time formatting it. I have tried solutions suggested in similar answers but none of them worked. I have also tried using raw string literals. Also i would not like to use shell=True as this is not recommended

Edit according to comment: The command i tried was

awk_command = """awk -v OFS="\t" '{printf "chr%s\t%s\t%s\n", $1, $2-1, $2}' file1.txt > file2.txt"""
command_execute = Popen(shlex.split(awk_command))

However i get the following error upon executing this

KeyError: 'printf "chr%s\t%s\t%s\n", $1, $2-1, $2'

googling the error suggests this happens when a value is requested for an undefined key but i do not understand its context here

回答1:

  1. The simplest method, especially if you wish to keep the output redirection stuff, is to use subprocess with shell=True - then you only need to escape Python special characters. The line, as a whole, will be interpreted by the default shell.

    • WARNING: do not use this with untrusted input without sanitizing it first!
  2. Alternatively, you can replace the command line with an argv-type sequence and feed that to subprocess instead. Then, you need to provide stuff as the program would see it:

    • remove all the shell-level escaping
    • remove the output redirection stuff and do the redirection yourself instead

Regarding the specific problems:

  • you didn't escape Python special characters in the string so \t and \n became the literal tab and newline (try to print awk_command)
  • using shlex.split is nothing different from shell=True - with an added unreliability since it cannot guarantee if would parse the string the same way your shell would in every case (not to mention the lack of transmutations the shell makes).

    • Specifically, it doesn't know or care about the special meaning of the redirection part:

      >>> awk_command = """awk -v OFS="\\t" '{printf "chr%s\\t%s\\t%s\\n", $1, $2- 1, $2}' file1.txt > file2.txt"""
      >>> shlex.split(awk_command)
      ['awk','-v','OFS=\\t','{printf "chr%s\\t%s\\t%s\\n", $1, $2-1, $2}','file1.txt','>','file2.txt']
      

So, if you wish to use shell=False, do construct the argument list yourself.



回答2:

> is the shell redirection operator. To implement it in Python, use stdout parameter:

#!/usr/bin/env python
import shlex
import subprocess

cmd = r"""awk -v OFS="\t" '{printf "chr%s\t%s\t%s\n", $1, $2-1, $2}'"""
with open('file2.txt', 'wb', 0) as output_file:
    subprocess.check_call(shlex.split(cmd) + ["file1.txt"], stdout=output_file)

To avoid starting a separate process, you could implement this particular awk command in pure Python.