argparse on demand imports for types, choices etc

2019-02-25 10:45发布

I have quite a big program which has a CLI interaction based on argparse, with several sub parsers. The list of supported choices for the subparsers arguments are determined based on DB queries, parsing different xml files, making different calculations etc, so it is quite IO intensive and time consuming.

The problem is that argparse seems to fetch choices for all sub parser when I run the script, which adds a considerable and annoying startup delay.

Is there a way to make argparse only fetch and validate choices for the currently used sub parser?

One solution could be to move all the validation logic deeper inside the code but that would mean quite a lot of work which I would like to avoid, if possible.

Thank you

4条回答
放荡不羁爱自由
2楼-- · 2019-02-25 11:12

To delay the fetching of choices, you could parse the command-line in two stages: In the first stage, you find only the subparser, and in the second stage, the subparser is used to parse the rest of the arguments:

import argparse
parser = argparse.ArgumentParser()
parser.add_argument('subparser', choices=['foo','bar'])

def foo_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument('fooval', choices='123')
    return parser

def bar_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument('barval', choices='ABC')
    return parser

dispatch = {'foo':foo_parser, 'bar':bar_parser}
args, unknown = parser.parse_known_args()
args = dispatch[args.subparser]().parse_args(unknown)
print(args)

It could be used like this:

% script.py foo 2
Namespace(fooval='2')

% script.py bar A
Namespace(barval='A')

Note that the top-level help message will be less friendly, since it can only tell you about the subparser choices:

% script.py -h
usage: script.py [-h] {foo,bar}
...

To find information about the choices in each subparser, the user would have to select the subparser and pass the -h to it:

% script.py bar -- -h
usage: script.py [-h] {A,B,C}

All arguments after the -- are considered non-options (to script.py) and are thus parsed by the bar_parser.

查看更多
放荡不羁爱自由
3楼-- · 2019-02-25 11:13

This is a script that tests the idea of delaying the creation of a subparser until it is actually needed. In theory it might save start up time, by only creating the subparser that's actually needed.

I use the nargs=argparse.PARSER to replicate the subparser behavior in the main parser. help behavior is similar.

# lazy subparsers test
# lazy behaves much like a regular subparser case, but only creates one subparser
# for N=5 time differences do not rise above the noise

import argparse

def regular(N):
    parser = argparse.ArgumentParser()
    sp = parser.add_subparsers(dest='cmd')
    for i in range(N):
        spp = sp.add_parser('cmd%s'%i)
        spp.set_defaults(func='cmd%s'%(10*i))
        spp.add_argument('-f','--foo')
        spp.add_argument('pos', nargs='*')
    return parser

def lazy(N):
    parser = argparse.ArgumentParser()
    sp = parser.add_argument('cmd', nargs=argparse.PARSER, choices=[])
    for i in range(N):
        sp.choices.append('cmd%s'%i)
    return parser

def subpar(cmd):
    cmd, argv = cmd[0], cmd[1:]
    parser = argparse.ArgumentParser(prog=cmd)
    parser.add_argument('-f','--foo')
    parser.add_argument('pos', nargs='*')
    parser.set_defaults(func=cmd)
    args = parser.parse_args(argv)
    return args

N = 5
mode = True #False
argv = 'cmd1 -f1 a b c'.split()
if mode:
    args = regular(N).parse_args(argv)
    print(args)
else:
    args = lazy(N).parse_args(argv)
    print(args)
    if isinstance(args.cmd, list):
        sargs = subpar(args.cmd)
        print(sargs)

test runs with different values of mode (and N=5)

1004:~/mypy$ time python3 stack44315696.py 
Namespace(cmd='cmd1', foo='1', func='cmd10', pos=['a', 'b', 'c'])

real    0m0.052s
user    0m0.044s
sys 0m0.008s
1011:~/mypy$ time python3 stack44315696.py 
Namespace(cmd=['cmd1', '-f1', 'a', 'b', 'c'])
Namespace(foo='1', func='cmd1', pos=['a', 'b', 'c'])

real    0m0.051s
user    0m0.048s
sys 0m0.000s

N has to be much larger to start seeing a effect.

查看更多
混吃等死
4楼-- · 2019-02-25 11:14

I have solved the issue by creating a simple ArgumentParser subclass:

import argparse

class ArgumentParser(argparse.ArgumentParser):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        self.lazy_init = None

    def parse_known_args(self, args=None, namespace=None):
        if self.lazy_init is not None:
            self.lazy_init()
            self.lazy_init = None

        return super().parse_known_args(args, namespace)

Then I can use it as following:

parser = argparse.ArgumentParser()
subparsers = parser.add_subparsers(dest='command', title='commands', parser_class=ArgumentParser)
subparsers.required = True

subparser = subparsers.add_parser(
    'do-something', help="do something",
    description="Do something great.",
)

def lazy_init():
    from my_database import data

    subparser.add_argument(
        '-o', '--option', choices=data.expensive_fetch(), action='save',
    )

subparser.lazy_init = lazy_init

This will really initialize a sub-parser only when parent parser tries to parse arguments for the sub-parser. So if you do program -h it will not initialize the sub-parser, but if you do program do-something -h it will.

查看更多
地球回转人心会变
5楼-- · 2019-02-25 11:21

Here's a quick and dirty example of a 'lazy' choices. In this case choices are a range of integers. I think a case that requires expensive DB lookups could implemented in a similar fashion.

# argparse with lazy choices

class LazyChoice(object):
    # large range
    def __init__(self, argmax):
        self.argmax=argmax
    def __contains__(self, item):
        # a 'lazy' test that does not enumerate all choices
        return item<=self.argmax
    def __iter__(self):
        # iterable for display in error message
        # use is in:
        # tup = value, ', '.join(map(repr, action.choices))
        # metavar bypasses this when formatting help/usage
        return iter(['integers less than %s'%self.argmax])

import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--regular','-r',choices=['one','two'])
larg = parser.add_argument('--lazy','-l', choices=LazyChoice(10))
larg.type = int
print parser.parse_args()

Implementing the testing part (__contains__) is easy. The help/usage can be customized with help and metavar attributes. Customizing the error message is harder. http://bugs.python.org/issue16468 discusses alternatives when choices are not iterable. (also on long list choices: http://bugs.python.org/issue16418)

I've also shown how the type can be changed after the initial setup. That doesn't solve the problem of setting type based on subparser choice. But it isn't hard to write a custom type, one that does some sort of Db lookup. All a type function needs to do is take a string, return the correct converted value, and raise ValueError if there's a problem.

查看更多
登录 后发表回答