Python list comprehension for Numpy

2019-07-03 21:55发布

问题:

I'm looking for list-comprehension method or similar in Numpy to eliminate use of a for-loop eg. index_values is a Python dictionary list of lists (each list containing a different number of index values) and s is a numpy vector:

for i in range(33):
    s[index_values[i]] += 4.1

Is there a method available that allows eliminating the for-loop?

回答1:

I don't fully understand what kind of object index_values is. But if it were an ndarray, or could be converted to an ndarray, you could just do this:

>>> s = numpy.arange(20)
>>> index_values = (numpy.random.random((3, 3)) * 20).astype('i')
>>> s[index_values] = 4
>>> s
array([ 0,  1,  4,  4,  4,  5,  6,  4,  8,  4,  4, 11, 12, 
       13,  4, 15,  4,  4,  4, 19])

Edit: But it seems that won't work in this case. On the basis of your edits and comments, here's a method I think might work for you. A random list of lists with varying lengths...

>>> index_values = [list(range(x, x + random.randrange(1, 5)))
...                 for x in [random.randrange(0,50) for y in range(33)]]

...isn't hard to convert into an array:

>>> index_value_array = numpy.fromiter(itertools.chain(*index_values), 
                                       dtype='i')

If you know the length of the array, specify the count for better performance:

>>> index_value_array = numpy.fromiter(itertools.chain(*index_values), 
                                       dtype='i', count=83)

Since your edit indicates that you want histogram-like behavior, simple indexing won't do, as pointed out by Robert Kern. So use numpy.histogram:

>>> hist = numpy.histogram(index_value_array, bins=range(0, 51))

histogram is really constructed for floating point histograms. This means that bins has to be a bit larger than expected because the last value is included in the last bin, and so 48 and 49 would be in the same bin if we used the more intuitive range(0, 50). The result is a tuple with an array of n counts and an array of n + 1 bin borders:

>>> hist
(array([2, 2, 1, 2, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 5, 5, 5, 3, 3, 
        3, 3, 3, 2, 1, 0, 2, 3, 3, 1, 0, 2, 3, 2, 2, 2, 3, 2, 1, 1, 2, 2, 
        2, 0, 0, 0, 1, 0]), 
 array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
        34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]))

Now we can scale the counts up by a factor of 4.1 and perform vector addition:

>>> s = numpy.arange(50, dtype='f')
>>> hist[0] * 4.1 + s
array([  8.2,   9.2,   6.1,  11.2,   8.1,   5. ,   6. ,   7. ,  12.1,
        13.1,  14.1,  15.1,  16.1,  13. ,  18.1,  19.1,  20.1,  37.5,
        38.5,  39.5,  32.3,  33.3,  34.3,  35.3,  36.3,  33.2,  30.1,
        27. ,  36.2,  41.3,  42.3,  35.1,  32. ,  41.2,  46.3,  43.2,
        44.2,  45.2,  50.3,  47.2,  44.1,  45.1,  50.2,  51.2,  52.2,
        45. ,  46. ,  47. ,  52.1,  49. ])

I have no idea if this suits your purposes, but it seems like a good approach, and should probably happen at near c speed since it uses only numpy and itertools.



回答2:

What about:

s[reduce(lambda x,y: x+y, [index_values[x] for x in range(33)], [])] = 4.1