pythonic format for indices

2020-03-19 03:09发布

问题:

I am after a string format to efficiently represent a set of indices. For example "1-3,6,8-10,16" would produce [1,2,3,6,8,9,10,16]

Ideally I would also be able to represent infinite sequences.

Is there an existing standard way of doing this? Or a good library? Or can you propose your own format?

thanks!

Edit: Wow! - thanks for all the well considered responses. I agree I should use ':' instead. Any ideas about infinite lists? I was thinking of using "1.." to represent all positive numbers.

The use case is for a shopping cart. For some products I need to restrict product sales to multiples of X, for others any positive number. So I am after a string format to represent this in the database.

回答1:

If you're into something Pythonic, I think 1:3,6,8:10,16 would be a better choice, as x:y is a standard notation for index range and the syntax allows you to use this notation on objects. Note that the call

z[1:3,6,8:10,16]

gets translated into

z.__getitem__((slice(1, 3, None), 6, slice(8, 10, None), 16))

Even though this is a TypeError if z is a built-in container, you're free to create the class that will return something reasonable, e.g. as NumPy's arrays.

You might also say that by convention 5: and :5 represent infinite index ranges (this is a bit stretched as Python has no built-in types with negative or infinitely large positive indexes).

And here's the parser (a beautiful one-liner that suffers from slice(16, None, None) glitch described below):

def parse(s):
    return [slice(*map(int, x.split(':'))) for x in s.split(',')]

There's one pitfall, however: 8:10 by definition includes only indices 8 and 9 -- without upper bound. If that's unacceptable for your purposes, you certainly need a different format and 1-3,6,8-10,16 looks good to me. The parser then would be

def myslice(start, stop=None, step=None):
    return slice(start, (stop if stop is not None else start) + 1, step)

def parse(s):
    return [myslice(*map(int, x.split('-'))) for x in s.split(',')]

Update: here's the full parser for a combined format:

from sys import maxsize as INF

def indices(s: 'string with indices list') -> 'indices generator':
    for x in s.split(','):
        splitter = ':' if (':' in x) or (x[0] == '-') else '-'
        ix = x.split(splitter)
        start = int(ix[0]) if ix[0] is not '' else -INF
        if len(ix) == 1:
            stop = start + 1
        else:
            stop = int(ix[1]) if ix[1] is not '' else INF
        step = int(ix[2]) if len(ix) > 2 else 1
        for y in range(start, stop + (splitter == '-'), step):
            yield y

This handles negative numbers as well, so

 print(list(indices('-5, 1:3, 6, 8:15:2, 20-25, 18')))

prints

[-5, 1, 2, 6, 7, 8, 10, 12, 14, 20, 21, 22, 23, 24, 25, 18, 19]

Yet another alternative is to use ... (which Python recognizes as the built-in constant Ellipsis so you can call z[...] if you want) but I think 1,...,3,6, 8,...,10,16 is less readable.



回答2:

You don't need a string for that, This is as simple as it can get:

from types import SliceType

class sequence(object):
    def __getitem__(self, item):
        for a in item:
            if isinstance(a, SliceType):
                i = a.start
                step = a.step if a.step else 1
                while True:
                    if a.stop and i > a.stop:
                        break
                    yield i
                    i += step
            else:
                yield a

print list(sequence()[1:3,6,8:10,16])

Output:

[1, 2, 3, 6, 8, 9, 10, 16]

I'm using Python slice type power to express the sequence ranges. I'm also using generators to be memory efficient.

Please note that I'm adding 1 to the slice stop, otherwise the ranges will be different because the stop in slices is not included.

It supports steps:

>>> list(sequence()[1:3,6,8:20:2])
[1, 2, 3, 6, 8, 10, 12, 14, 16, 18, 20]

And infinite sequences:

sequence()[1:3,6,8:]
1, 2, 3, 6, 8, 9, 10, ...

If you have to give it a string then you can combine @ilya n. parser with this solution. I'll extend @ilya n. parser to support indexes as well as ranges:

def parser(input):
    ranges = [a.split('-') for a in input.split(',')]
    return [slice(*map(int, a)) if len(a) > 1 else int(a[0]) for a in ranges]

Now you can use it like this:

>>> print list(sequence()[parser('1-3,6,8-10,16')])
[1, 2, 3, 6, 8, 9, 10, 16]


回答3:

This is probably about as lazily as it can be done, meaning it will be okay for even very large lists:

def makerange(s):
    for nums in s.split(","): # whole list comma-delimited
        range_ = nums.split("-") # number might have a dash - if not, no big deal
        start = int(range_[0])
        for i in xrange(start, start + 1 if len(range_) == 1 else int(range_[1]) + 1):
            yield i

s = "1-3,6,8-10,16"
print list(makerange(s))

output:

[1, 2, 3, 6, 8, 9, 10, 16]


回答4:

import sys

class Sequencer(object):
    def __getitem__(self, items):
        if not isinstance(items, (tuple, list)):
            items = [items]
        for item in items:
            if isinstance(item, slice):
                for i in xrange(*item.indices(sys.maxint)):
                    yield i
            else:
                yield item


>>> s = Sequencer()
>>> print list(s[1:3,6,8:10,16])
[1, 2, 6, 8, 9, 16]

Note that I am using the xrange builtin to generate the sequence. That seems awkward at first because it doesn't include the upper number of sequences by default, however it proves to be very convenient. You can do things like:

>>> print list(s[1:10:3,5,5,16,13:5:-1])
[1, 4, 7, 5, 5, 16, 13, 12, 11, 10, 9, 8, 7, 6]

Which means you can use the step part of xrange.



回答5:

This looked like a fun puzzle to go with my coffee this morning. If you settle on your given syntax (which looks okay to me, with some notes at the end), here is a pyparsing converter that will take your input string and return a list of integers:

from pyparsing import *

integer = Word(nums).setParseAction(lambda t : int(t[0]))
intrange = integer("start") + '-' + integer("end")
def validateRange(tokens):
    if tokens.from_ > tokens.to:
        raise Exception("invalid range, start must be <= end")
intrange.setParseAction(validateRange)
intrange.addParseAction(lambda t: list(range(t.start, t.end+1)))

indices = delimitedList(intrange | integer)

def mergeRanges(tokens):
    ret = set()
    for item in tokens:
        if isinstance(item,int):
            ret.add(item)
        else:
            ret += set(item)
    return sorted(ret)

indices.setParseAction(mergeRanges)

test = "1-3,6,8-10,16"
print indices.parseString(test)

This also takes care of any overlapping or duplicate entries, such "3-8,4,6,3,4", and returns a list of just the unique integers.

The parser takes care of validating that ranges like "10-3" are not allowed. If you really wanted to allow this, and have something like "1,5-3,7" return 1,5,4,3,7, then you could tweak the intrange and mergeRanges parse actions to get this simpler result (and discard the validateRange parse action altogether).

You are very likely to get whitespace in your expressions, I assume that this is not significant. "1, 2, 3-6" would be handled the same as "1,2,3-6". Pyparsing does this by default, so you don't see any special whitespace handling in the code above (but it's there...)

This parser does not handle negative indices, but if that were needed too, just change the definition of integer to:

integer = Combine(Optional('-') + Word(nums)).setParseAction(lambda t : int(t[0]))

Your example didn't list any negatives, so I left it out for now.

Python uses ':' for a ranging delimiter, so your original string could have looked like "1:3,6,8:10,16", and Pascal used '..' for array ranges, giving "1..3,6,8..10,16" - meh, dashes are just as good as far as I'm concerned.