How should I handle inclusive ranges in Python?

2019-02-02 20:33发布

问题:

I am working in a domain in which ranges are conventionally described inclusively. I have human-readable descriptions such as from A to B , which represent ranges that include both end points - e.g. from 2 to 4 means 2, 3, 4.

What is the best way to work with these ranges in Python code? The following code works to generate inclusive ranges of integers, but I also need to perform inclusive slice operations:

def inclusive_range(start, stop, step):
    return range(start, (stop + 1) if step >= 0 else (stop - 1), step)

The only complete solution I see is to explicitly use + 1 (or - 1) every time I use range or slice notation (e.g. range(A, B + 1), l[A:B+1], range(B, A - 1, -1)). Is this repetition really the best way to work with inclusive ranges?

Edit: Thanks to L3viathan for answering. Writing an inclusive_slice function to complement inclusive_range is certainly an option, although I would probably write it as follows:

def inclusive_slice(start, stop, step):
    ...
    return slice(start, (stop + 1) if step >= 0 else (stop - 1), step)

... here represents code to handle negative indices, which are not straightforward when used with slices - note, for example, that L3viathan's function gives incorrect results if slice_to == -1.

However, it seems that an inclusive_slice function would be awkward to use - is l[inclusive_slice(A, B)] really any better than l[A:B+1]?

Is there any better way to handle inclusive ranges?

Edit 2: Thank you for the new answers. I agree with Francis and Corley that changing the meaning of slice operations, either globally or for certain classes, would lead to significant confusion. I am therefore now leaning towards writing an inclusive_slice function.

To answer my own question from the previous edit, I have come to the conclusion that using such a function (e.g. l[inclusive_slice(A, B)]) would be better than manually adding/subtracting 1 (e.g. l[A:B+1]), since it would allow edge cases (such as B == -1 and B == None) to be handled in a single place. Can we reduce the awkwardness in using the function?

Edit 3: I have been thinking about how to improve the usage syntax, which currently looks like l[inclusive_slice(1, 5, 2)]. In particular, it would be good if the creation of an inclusive slice resembled standard slice syntax. In order to allow this, instead of inclusive_slice(start, stop, step), there could be a function inclusive that takes a slice as a parameter. The ideal usage syntax for inclusive would be line 1:

l[inclusive(1:5:2)]          # 1
l[inclusive(slice(1, 5, 2))] # 2
l[inclusive(s_[1:5:2])]      # 3
l[inclusive[1:5:2]]          # 4
l[1:inclusive(5):2]          # 5

Unfortunately this is not permitted by Python, which only allows the use of : syntax within []. inclusive would therefore have to be called using either syntax 2 or 3 (where s_ acts like the version provided by numpy).

Other possibilities are to make inclusive into an object with __getitem__, permitting syntax 4, or to apply inclusive only to the stop parameter of the slice, as in syntax 5. Unfortunately I do not believe the latter can be made to work since inclusive requires knowledge of the step value.

Of the workable syntaxes (the original l[inclusive_slice(1, 5, 2)], plus 2, 3 and 4), which would be the best to use? Or is there another, better option?

Final Edit: Thank you all for the replies and comments, this has been very interesting. I have always been a fan of Python's "one way to do it" philosophy, but this issue has been caused by a conflict between Python's "one way" and the "one way" proscribed by the problem domain. I have definitely gained some appreciation for TIMTOWTDI in language design.

For giving the first and highest-voted answer, I award the bounty to L3viathan.

回答1:

Write an additional function for inclusive slice, and use that instead of slicing. While it would be possible to e.g. subclass list and implement a __getitem__ reacting to a slice object, I would advise against it, since your code will behave contrary to expectation for anyone but you — and probably to you, too, in a year.

inclusive_slice could look like this:

def inclusive_slice(myList, slice_from=None, slice_to=None, step=1):
    if slice_to is not None:
        slice_to += 1 if step > 0 else -1
    if slice_to == 0:
        slice_to = None
    return myList[slice_from:slice_to:step]

What I would do personally, is just use the "complete" solution you mentioned (range(A, B + 1), l[A:B+1]) and comment well.



回答2:

Since in Python, the ending index is always exclusive, it's worth considering to always use the "Python-convention" values internally. This way, you will save yourself from mixing up the two in your code.

Only ever deal with the "external representation" through dedicated conversion subroutines:

def text2range(text):
    m = re.match(r"from (\d+) to (\d+)",text)
    start,end = int(m.groups(1)),int(m.groups(2))+1

def range2text(start,end):
    print "from %d to %d"%(start,end-1)

Alternatively, you can mark the variables holding the "unusual" representation with the true Hungarian notation.



回答3:

If you don't want to specify the step size but rather the number of steps, there is the option to use numpy.linspace which includes the starting and ending point

import numpy as np

np.linspace(0,5,4)
# array([ 0.        ,  1.66666667,  3.33333333,  5.        ])


回答4:

I believe that the standard answer is to just use +1 or -1 everywhere it is needed.

You don't want to globally change the way slices are understood (that will break plenty of code), but another solution would be to build a class hierarchy for the objects for which you wish the slices to be inclusive. For example, for a list:

class InclusiveList(list):
    def __getitem__(self, index):
        if isinstance(index, slice):
            start, stop, step = index.start, index.stop, index.step
            if index.stop is not None:
                if index.step is None:
                    stop += 1
                else:
                    if index.step >= 0:
                        stop += 1
                    else:
                        if stop == 0: 
                            stop = None # going from [4:0:-1] to [4::-1] since [4:-1:-1] wouldn't work 
                        else:
                            stop -= 1
            return super().__getitem__(slice(start, stop, step))
        else:
            return super().__getitem__(index)

>>> a = InclusiveList([1, 2, 4, 8, 16, 32])
>>> a
[1, 2, 4, 8, 16, 32]
>>> a[4]
16
>>> a[2:4]
[4, 8, 16]
>>> a[3:0:-1]
[8, 4, 2, 1]
>>> a[3::-1]
[8, 4, 2, 1]
>>> a[5:1:-2]
[32, 8, 2]

Of course, you want to do the same with __setitem__ and __delitem__.

(I used a list but that works for any Sequence or MutableSequence.)



回答5:

Without writing your own class, the function seems to be the way to go. What i can think of at most is not storing actual lists, just returning generators for the range you care about. Since we're now talking about usage syntax - here is what you could do

def closed_range(slices):
    slice_parts = slices.split(':')
    [start, stop, step] = map(int, slice_parts)
    num = start
    if start <= stop and step > 0:
        while num <= stop:
            yield num
            num += step
    # if negative step
    elif step < 0:
        while num >= stop:
            yield num
            num += step

And then use as:

list(closed_range('1:5:2'))
[1,3,5]

Of course you'll need to also check for other forms of bad input if anyone else is going to use this function.



回答6:

Was going to comment, but it's easier to write code as an answer, so...

I would NOT write a class that redefines slicing, unless it's VERY clear. I have a class that represents ints with bit slicing. In my contexts, '4:2' is very clearly inclusive, and ints don't already have any use for slicing, so it's (barely) acceptable (imho, and some would disagree).

For lists, you have the case that you'll do something like

list1 = [1,2,3,4,5]
list2 = InclusiveList([1,2,3,4,5])

and later on in your code

if list1[4:2] == test_list or list2[4:2] == test_list:

and that is a very easy mistake to make, since list already HAS a well-defined usage.. they look identical, but act differently, and so this will be very confusing to debug, especially if you didn't write it.

That doesn't mean you're completely lost... slicing is convenient, but after all, it's just a function. And you can add that function to anything like this, so this might be an easier way to get to it:

class inc_list(list):
    def islice(self, start, end=None, dir=None):
        return self.__getitem__(slice(start, end+1, dir))

l2 = inc_list([1,2,3,4,5])
l2[1:3]
[0x3,
 0x4]
l2.islice(1,3)
[0x3,
 0x4,
 0x5]

However, this solution, like many others, (besides being incomplete... i know) has the achilles' heel in that it's just not as simple as the simple slice notation... it's a little more simple than passing the list as an argument, but still harder than just [4:2]. The only way to make that happen is to pass something different to the slice, that could be interepreted differently, so that the user would know on reading it what they did, and it could still be as simple.

One possibility... floating point numbers. They're different, so you can see them, and they aren't too much more difficult than the 'simple' syntax. It's not built-in, so there's still some 'magic' involved, but as far as syntactic sugar, it's not bad....

class inc_list(list):
    def __getitem__(self, x):
        if isinstance(x, slice):
            start, end, step = x.start, x.stop, x.step
            if step == None:
                step = 1
            if isinstance(end, float):
                end = int(end)
                end = end + step
                x = slice(start, end, step)
            return list.__getitem__(self, x)

l2 = inc_list([1,2,3,4,5])
l2[1:3]
[0x2,
 0x3]
l2[1:3.0]
[0x2,
 0x3,
 0x4]

The 3.0 should be enough to tell any python programmer 'hey, something unusual is going on there'... not necessarily what is going on, but at least there's not surprise that it acts 'weird'.

Note that there's nothing unique about that to lists... you could easy write a decorator that could do this for any class:

def inc_getitem(self, x):
    if isinstance(x, slice):
        start, end, step = x.start, x.stop, x.step
        if step == None:
            step = 1
        if isinstance(end, float):
            end = int(end)
            end = end + step
            x = slice(start, end, step)
    return list.__getitem__(self, x)

def inclusiveclass(inclass):
    class newclass(inclass):
        __getitem__ = inc_getitem
    return newclass

ilist = inclusiveclass(list)

or

@inclusiveclass
class inclusivelist(list):
    pass

The first form is probably more useful though.



回答7:

Focusing on your request for best syntax, what about targeting:

l[1:UpThrough(5):2]

You can achieve this using the __index__ method:

class UpThrough(object):
    def __init__(self, stop):
        self.stop = stop

    def __index__(self):
        return self.stop + 1

class DownThrough(object):
    def __init__(self, stop):
        self.stop = stop

    def __index__(self):
        return self.stop - 1

Now you don't even need a specialized list class (and don't need to modify global definition either):

>>> l = [1,2,3,4]
>>> l[1:UpThrough(2)]
[2,3]

If you use a lot you could use shorter names upIncl, downIncl or even In and InRev.

You can also build out these classes so that, other than use in slice, they act like the actual index:

def __int__(self):
    return self.stop


回答8:

It's difficult and probably not wise to overload such basic concepts. with a new inclusivelist class, len(l[a:b]) in b-a+1 which can lead to confusions.
To preserve the natural python sense, while giving readability in a BASIC style, just define :

STEP=FROM=lambda x:x
TO=lambda x:x+1 if x!=-1 else None 
DOWNTO=lambda x:x-1 if x!=0 else None

then you can manage as you want, keeping the natural python logic :

>>>>l=list(range(FROM(0),TO(9)))
>>>>l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>l[FROM(9):DOWNTO(3):STEP(-2)] == l[9:2:-2]
True


回答9:

Instead of creating API that is not conventional or extending data types like list, it would be ideal to create a Slice function a wrapper over the built-in slice so that you can pass it across any where, a slicing is requiring. Python has support for this approach for some exceptional cases, and the case you have could warrant for that exception case. As an example, an inclusive slice would look like

def islice(start, stop = None, step = None):
    if stop is not None: stop += 1
    if stop == 0: stop = None
    return slice(start, stop, step)

And you can use it for any sequence types

>>> range(1,10)[islice(1,5)]
[2, 3, 4, 5, 6]
>>> "Hello World"[islice(0,5,2)]
'Hlo'
>>> (3,1,4,1,5,9,2,6)[islice(1,-2)]
(1, 4, 1, 5, 9, 2)

Finally you can also create an inclusive range called irange to complement the inclusive slice (written in lines of OPs).

def irange(start, stop, step):
    return range(start, (stop + 1) if step >= 0 else (stop - 1), step)