How to efficiently index into a 1D numpy array via

2019-05-14 08:03发布

问题:

I have a big 1D array of data. I have a starts array of indexes into that data where important things happened. I want to get an array of ranges so that I get windows of length L, one for each starting point in starts. Bogus sample data:

data = np.linspace(0,10,50)
starts = np.array([0,10,21])
length = 5

I want to instinctively do something like

data[starts:starts+length]

But really, I need to turn starts into 2D array of range "windows." Coming from functional languages, I would think of it as a map from a list to a list of lists, like:

np.apply_along_axis(lambda i: np.arange(i,i+length), 0, starts)

But that won't work because apply_along_axis only allows scalar return values.

You can do this:

pairs = np.vstack([starts, starts + length]).T
ranges = np.apply_along_axis(lambda p: np.arange(*p), 1, pairs)
data[ranges]

Or you can do it with a list comprehension:

data[np.array([np.arange(i,i+length) for i in starts])]

Or you can do it iteratively. (Bleh.)

Is there a concise, idiomatic way to slice into an array at certain start points like this? (Pardon the numpy newbie-ness.)

回答1:

data = np.linspace(0,10,50)
starts = np.array([0,10,21])
length = 5

For a NumPy only way of doing this, you can use numpy.meshgrid() as described here

http://docs.scipy.org/doc/numpy/reference/generated/numpy.meshgrid.html

As hpaulj pointed out in the comments, meshgrid actually isn't needed for this problem as you can use array broadcasting.

http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

# indices = sum(np.meshgrid(np.arange(length), starts))

indices = np.arange(length) + starts[:, np.newaxis]
# array([[ 0,  1,  2,  3,  4],
#        [10, 11, 12, 13, 14],
#        [21, 22, 23, 24, 25]])
data[indices]

returns

array([[ 0.        ,  0.20408163,  0.40816327,  0.6122449 ,  0.81632653],
       [ 2.04081633,  2.24489796,  2.44897959,  2.65306122,  2.85714286],
       [ 4.28571429,  4.48979592,  4.69387755,  4.89795918,  5.10204082]])


回答2:

If you need to do this a lot of time, you can use as_strided() to create a sliding windows array of data

data = np.linspace(0,10,50000)
length = 5
starts = np.random.randint(0, len(data)-length, 10000)

from numpy.lib.stride_tricks import as_strided
sliding_window = as_strided(data, (len(data) - length + 1, length), 
                 (data.itemsize, data.itemsize))

Then you can use:

sliding_window[starts]

to get what you want.

It's also faster than creating the index array.