Slicing repeadlty with the same slice numpy

2019-07-29 17:54发布

问题:

I have several one dimensional numpy array (around 5 millions elements)

I have to slice them repeatedly with the same slice. I have a a collections of arrays (all of the same dimensions ) and I want to slice them with the same array index (same dimension of the arrays)

Is there a way to cal A[index] for all the different arrays A which is more efficient than the naive way?

Maybe there’s a way to use Cython to speed things up?

Thank you!

Edit

To make things clearer, this is my setting: I have one array A of several million elements. To perform a certain operation on this array A, I first need to sort it; but then I want to recover the original order, so I un-sort it. I need to repeat this several times. So in short:

A = np.random.rand(5e6, 1)
indices = np.argsort(A)
sortedA = A[indices]
inv_indices = np.argsort(indices)

for _ in range(100):
    fancy_A = fancy_function(sortedA) #returns an array with the same dimensions
    res = fancy_A[inv_indices]
    results.append(res)

I want to optimize the code inside the loop. As you can see, inv_indices is always the same, and I thought that there may be a more efficient way of doing that.

Thanks!

回答1:

Since inv_indices reorders the array, rather than selecting subsets, it probably is just as fast, and space efficient, to collect the fancy_A into one bit array, and index that.

results = []
for _ in range(100):
    fancy_A = fancy_function(sortedA) #returns an array with the same dimensions
    #res = fancy_A[inv_indices]
    results.append(fancy_A)

bigA = np.stack(results)
bigA = bigA[:, inv_indices]    # assumes inv_indices is a list or array

If the fancy_A is 1d and inv_indices a simple list, then applying it to the stack is straight forward:

In [849]: A = np.random.randint(0,10,10)
In [850]: A
Out[850]: array([0, 1, 5, 7, 4, 4, 0, 6, 9, 1])
In [851]: idx = np.argsort(A)
In [852]: idx
Out[852]: array([0, 6, 1, 9, 4, 5, 2, 7, 3, 8], dtype=int32)
In [853]: A[idx]
Out[853]: array([0, 0, 1, 1, 4, 4, 5, 6, 7, 9])
In [854]: res = [A for _ in range(5)]
In [855]: res = np.stack([A for _ in range(5)])
In [856]: res
Out[856]: 
array([[0, 1, 5, 7, 4, 4, 0, 6, 9, 1],
       [0, 1, 5, 7, 4, 4, 0, 6, 9, 1],
       [0, 1, 5, 7, 4, 4, 0, 6, 9, 1],
       [0, 1, 5, 7, 4, 4, 0, 6, 9, 1],
       [0, 1, 5, 7, 4, 4, 0, 6, 9, 1]])
In [857]: res[:,idx]
Out[857]: 
array([[0, 0, 1, 1, 4, 4, 5, 6, 7, 9],
       [0, 0, 1, 1, 4, 4, 5, 6, 7, 9],
       [0, 0, 1, 1, 4, 4, 5, 6, 7, 9],
       [0, 0, 1, 1, 4, 4, 5, 6, 7, 9],
       [0, 0, 1, 1, 4, 4, 5, 6, 7, 9]])

On the time it takes to index a whole array:

In [860]: A = np.random.randint(0,1000,100000)
In [861]: idx = np.argsort(A)
In [862]: timeit A.copy()
31.8 µs ± 394 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [863]: timeit A[idx]
332 µs ± 9.69 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)