Count consecutive occurences of values varying in

2020-01-27 06:50发布

问题:

Say I have a bunch of numbers in a numpy array and I test them based on a condition returning a boolean array:

np.random.seed(3456)
a = np.random.rand(8)
condition = a>0.5

And with this boolean array I want to count all of the lengths of consecutive occurences of True. For example if I had [True,True,True,False,False,True,True,False,True] I would want to get back [3,2,1].

I can do that using this code:

length,count = [],0
for i in range(len(condition)):

    if condition[i]==True:
        count += 1
    elif condition[i]==False and count>0:
        length.append(count)
        count = 0

    if i==len(condition)-1 and count>0:
        length.append(count)

    print length

But is there anything already implemented for this or a python,numpy,scipy, etc. function that counts the length of consecutive occurences in a list or array for a given input?

回答1:

Here's a solution using itertools (it's probably not the fastest solution):

import itertools
condition = [True,True,True,False,False,True,True,False,True]
[ sum( 1 for _ in group ) for key, group in itertools.groupby( condition ) if key ]

Out:
[3, 2, 1]


回答2:

If you already have a numpy array, this is probably going to be faster:

>>> condition = np.array([True,True,True,False,False,True,True,False,True])
>>> np.diff(np.where(np.concatenate(([condition[0]],
                                     condition[:-1] != condition[1:],
                                     [True])))[0])[::2]
array([3, 2, 1])

It detects where chunks begin, has some logic for the first and last chunk, and simply computes differences between chunk starts and discards lengths corresponding to False chunks.