Pythonic way to determine whether not null list en

2019-03-09 06:56发布

I'm looking for a way to easily determine if all not None items in a list occur in a single continuous slice. I'll use integers as examples of not None items.

For example, the list [None, None, 1, 2, 3, None, None] meets my requirements for continuous integer entries. By contrast, [1, 2, None, None, 3, None] is not continuous, because there are None entries between integers.

Some more examples to make this a clear as possible.

Continuous:
[1, 2, 3, None, None]
[None, None, 1, 2, 3]
[None, 1, 2, 3, None]

Not Continuous:
[None, 1, None, 2, None, 3]
[None, None, 1, None, 2, 3]
[1, 2, None, 3, None, None]

My first approach was to use variables to keep track of whether or not we had come across a None yet, and whether or not we had come across an int yet -- this ends up with a highly nested and very difficult to follow series of if/else statements embedded in a for loop. (On top of the ugliness, I confess I haven't gotten it to work in every case).

Anyone know an easier way to figure out if the not None items in a list occur in a single continuous slice?

10条回答
一夜七次
2楼-- · 2019-03-09 07:24

The natural way to consume sequence elements is to use dropwhile:

from itertools import dropwhile
def continuous(seq):
    return all(x is None for x in dropwhile(lambda x: x is not None,
                                            dropwhile(lambda x: x is None, seq)))

We can express this without nested function calls:

from itertools import dropwhile
def continuous(seq):
    core = dropwhile(lambda x: x is None, seq)
    remainder = dropwhile(lambda x: x is not None, core)
    return all(x is None for x in remainder)
查看更多
一夜七次
3楼-- · 2019-03-09 07:28

Good 'ol itertools.groupby to the rescue:

from itertools import groupby

def contiguous(seq):
    return sum(1 for k,g in groupby(seq, lambda x: x is not None) if k) == 1

gives

>>> contiguous([1,2,3,None,None])
True
>>> contiguous([None, 1,2,3,None])
True
>>> contiguous([None, None, 1,2,3])
True
>>> contiguous([None, 1, None, 2,3])
False
>>> contiguous([None, None, 1, None, 2,3])
False
>>> contiguous([None, 1, None, 2, None, 3])
False
>>> contiguous([1, 2, None, 3, None, None])
False

[edit]

Since there seems to be some discussion in the comments, I'll explain why I like this approach better than some of the others.

We're trying to find out whether there is one contiguous group of non-None objects, and

sum(1 for k,g in groupby(seq, lambda x: x is not None) if k)

counts the number of contiguous non-None objects, using the function in the stdlib which is designed for making collecting contiguous groups. As soon as we see groupby, we think "contiguous groups", and vice-versa. In that sense, it's self-documenting. This is basically the definition of my goal.

IMHO the only weakness is that it doesn't short-circuit, and that could be fixed, but after thinking about it some I still prefer this as it uses a primitive I like -- "count the number of contiguous non-None groups" -- which I prefer to simply "tell me whether or not there is more than one contiguous non-None group as soon as you can".

Many of the approaches to implement the last one rely on clever observations about the problem, like "if there's only one contiguous group of not-None objects, then if we scan until we find the first not-None object, and then scan through objects until we find the first non-None group if one exists, then whether anything's left is None gives us our answer." (Or something like that, which is part of my issue: I have to think about it.) To me that feels like using "implementation details" about the problem to solve it, and focuses on properties of the problem we can use to solve it, rather than simply specifying the problem to Python and letting Python do the work.

I'm a bear of very little brain, as the saying has it, and I like to avoid having to be clever, as in my experience it's a route littered with FAIL.

As always, everyone's mileage may vary, of course, and probably in proportion to their cleverness.

查看更多
我命由我不由天
4楼-- · 2019-03-09 07:34

I did some profiling to compare @gnibbler's approach with the groupby approach. @gnibber's approach is consistently faster, esp. for longer lists. E.g., I see about a 50% performance gain for random inputs with length 3-100, with a 50% chance of containing a single int sequence (randomly selected), and otherwise with random values. Test code below. I interspersed the two methods (randomly selecting which one goes first) to make sure any caching effects get cancelled out. Based on this, I'd say that while the groupby approach is more intuitive, @gnibber's approach may be appropriate if profiling indicates that this is an important part of the overall code to optimize -- in that case, appropriate comments should be used to indicate what's going on with the use of all/any to consumer iterator values.

from itertools import groupby
import random, time

def contiguous1(seq):
    # gnibber's approach
    seq = iter(seq)
    all(x is None for x in seq)        # Burn through any Nones at the beginning
    any(x is None for x in seq)        # and the first group
    return all(x is None for x in seq) # everthing else (if any) should be None.

def contiguous2(seq):
    return sum(1 for k,g in groupby(seq, lambda x: x is not None) if k) == 1

times = {'contiguous1':0,'contiguous2':0}

for i in range(400000):
    n = random.randint(3,100)
    items = [None] * n
    if random.randint(0,1):
        s = random.randint(0,n-1)
        e = random.randint(0,n-s)
        for i in range(s,e):
            items[i] = 3
    else:
        for i in range(n):
            if not random.randint(0,2):
                items[i] = 3
    if random.randint(0,1):
        funcs = [contiguous1, contiguous2]
    else:
        funcs = [contiguous2, contiguous1]
    for func in funcs:
        t0 = time.time()
        func(items)
        times[func.__name__] += (time.time()-t0)

print
for f,t in times.items():
    print '%10.7f %s' % (t, f)
查看更多
▲ chillily
5楼-- · 2019-03-09 07:34

Here's a solution inspired by numpy. Get the array indices of all the non-null elements. Then, compare each index to the one following it. If the difference is greater than one, there are nulls in between the non-nulls. If there are no indices where the following index is more than one greater, then there are no gaps.

def is_continuous(seq):
    non_null_indices = [i for i, obj in enumerate(seq) if obj is not None]
    for i, index in enumerate(non_null_indices[:-1]):
        if non_null_indices[i+1] - index > 1:
            return False
    return True
查看更多
登录 后发表回答