Support arbitrary number of related named argument

2019-08-26 07:21发布

问题:

I'd like to support a command line interface where users can declare an arbitrary number of samples, with one or more input files corresponding to each sample. Something like this:

$ myprogram.py \
      --foo bar \
      --sample1 input1.tsv \
      --sample2 input2a.tsv input2b.tsv input2c.tsv \
      --sample3 input3-filtered.tsv \
      --out output.tsv

The idea is that the option keys will match the pattern --sample(\d+), and each key will consume all subsequent arguments as option values until the next - or -- prefixed flag is encountered. For explicitly declared arguments, this is a common use case that the argparse module supports with the nargs='+' option. But since I need to support an arbitrary number of arguments I can't declare them explicitly.

The parse_known_args command will give me access to all user-supplied arguments, but those not explicitly declared will not be grouped into an indexed data structure. For these I would need to carefully examine the argument list, look ahead to see how many of the subsequent values correspond to the current flag, etc.

Is there any way I can parse these options without having to essentially re-implement large parts of an argument parser (almost) from scratch?

回答1:

If you can live with a slightly different syntax, namely:

$ myprogram.py \
  --foo bar \
  --sample input1.tsv \
  --sample input2a.tsv input2b.tsv input2c.tsv \
  --sample input3-filtered.tsv \
  --out output.tsv

where the parameter name doesn't contain a number, but still it performs grouping, try this:

parser.add_argument('--sample', action='append', nargs='+')

It produces a list of lists, ie. --sample x y --sample 1 2 will produce Namespace(sample=[['x', 'y'], ['1', '2']])



回答2:

As I mentioned in my comment:

import argparse

argv = "myprogram.py \
      --foo bar \
      --sample1 input1.tsv \
      --sample2 input2a.tsv input2b.tsv input2c.tsv \
      --sample3 input3-filtered.tsv \
      --out output.tsv"

parser = argparse.ArgumentParser()
parser.add_argument('--foo')
parser.add_argument('--out')
for x in range(1, argv.count('--sample') + 1):
    parser.add_argument('--sample' + str(x), nargs='+')
args = parser.parse_args(argv.split()[1:])

Gives:

print args
Namespace(foo='bar', out='output.tsv', sample1=['input1.tsv'], sample2=['input2a.tsv', 'input2b.tsv', 'input2c.tsv'], sample3=['input3-filtered.tsv'])

With the real sys.argv you'll probably have to replace the argv.count with the slightly longer ' '.join(sys.argv).count('--sample')

The major downside to this approach is the auto help generation will not cover these fields.



回答3:

It would be simpler to make that number or key at separate argument value, and collect the related arguments in an nested list.

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--foo')
parser.add_argument('--out')
parser.add_argument('--sample', nargs='+', action='append', metavar=('KEY','TSV'))

parser.print_help()

argv = "myprogram.py \
      --foo bar \
      --sample 1 input1.tsv \
      --sample 2 input2a.tsv input2b.tsv input2c.tsv \
      --sample 3 input3-filtered.tsv \
      --out output.tsv"
argv = argv.split()
args = parser.parse_args(argv[1:])
print(args)

produces:

1031:~/mypy$ python3 stack44267794.py -h
usage: stack44267794.py [-h] [--foo FOO] [--out OUT] [--sample KEY [TSV ...]]

optional arguments:
  -h, --help            show this help message and exit
  --foo FOO
  --out OUT
  --sample KEY [TSV ...]
Namespace(foo='bar', out='output.tsv', 
    sample=[['1', 'input1.tsv'], 
            ['2', 'input2a.tsv', 'input2b.tsv', 'input2c.tsv'], 
            ['3', 'input3-filtered.tsv']])

There have been questions about collecting general key:value pairs. There's nothing in argparse to directly support that. Various things have been suggested, but all boil down to parsing the pairs yourself.

Is it possible to use argparse to capture an arbitrary set of optional arguments?

You have added the complication that the number of arguments per key is variable. That rules out handling '--sample1=input1' as simple strings.

argparse has extended a well known POSIX commandline standard. But if you want to move beyond that, then be prepared to process the arguments either before (sys.argv) or after argparse (the parse_known_args extras).



回答4:

It may well be possible to do the sort of thing that you are looking for with click rather than argparse.

To quote:

$ click_

Click is a Python package for creating beautiful command line interfaces in a composable way with as little code as necessary. It's the "Command Line Interface Creation Kit". It's highly configurable but comes with sensible defaults out of the box.

It aims to make the process of writing command line tools quick and fun while also preventing any frustration caused by the inability to implement an intended CLI API.

Click in three points:

  • arbitrary nesting of commands
  • automatic help page generation
  • supports lazy loading of subcommands at runtime

    Read the docs at http://click.pocoo.org/

One of the important features of click is the ability to construct sub-commands, (a bit like using git or image magic covert), which should allow you to structure your command line as:

myprogram.py \
  --foo bar \
  --sampleset input1.tsv \
  --sampleset input2a.tsv input2b.tsv input2c.tsv \
  --sampleset input3-filtered.tsv \
  combinesets --out output.tsv

Or even:

myprogram.py \
  --foo bar \
  process input1.tsv \
  process input2a.tsv input2b.tsv input2c.tsv \
  process input3-filtered.tsv \
  combine --out output.tsv

Which might be cleaner, in this case your code would have parameters called --foo and --out and functions called process and combine process would be called with the input file(s) specified and combine with no parameters.