I have several one dimensional numpy array (around 5 millions elements)
I have to slice them repeatedly with the same slice. I have a a collections of arrays (all of the same dimensions ) and I want to slice them with the same array index (same dimension of the arrays)
Is there a way to cal A[index] for all the different arrays A which is more efficient than the naive way?
Maybe there’s a way to use Cython to speed things up?
Thank you!
Edit
To make things clearer, this is my setting: I have one array A of several million elements. To perform a certain operation on this array A, I first need to sort it; but then I want to recover the original order, so I un-sort it. I need to repeat this several times. So in short:
A = np.random.rand(5e6, 1)
indices = np.argsort(A)
sortedA = A[indices]
inv_indices = np.argsort(indices)
for _ in range(100):
fancy_A = fancy_function(sortedA) #returns an array with the same dimensions
res = fancy_A[inv_indices]
results.append(res)
I want to optimize the code inside the loop. As you can see, inv_indices is always the same, and I thought that there may be a more efficient way of doing that.
Thanks!
Since inv_indices
reorders the array, rather than selecting subsets, it probably is just as fast, and space efficient, to collect the fancy_A
into one bit array, and index that.
results = []
for _ in range(100):
fancy_A = fancy_function(sortedA) #returns an array with the same dimensions
#res = fancy_A[inv_indices]
results.append(fancy_A)
bigA = np.stack(results)
bigA = bigA[:, inv_indices] # assumes inv_indices is a list or array
If the fancy_A
is 1d and inv_indices
a simple list, then applying it to the stack is straight forward:
In [849]: A = np.random.randint(0,10,10)
In [850]: A
Out[850]: array([0, 1, 5, 7, 4, 4, 0, 6, 9, 1])
In [851]: idx = np.argsort(A)
In [852]: idx
Out[852]: array([0, 6, 1, 9, 4, 5, 2, 7, 3, 8], dtype=int32)
In [853]: A[idx]
Out[853]: array([0, 0, 1, 1, 4, 4, 5, 6, 7, 9])
In [854]: res = [A for _ in range(5)]
In [855]: res = np.stack([A for _ in range(5)])
In [856]: res
Out[856]:
array([[0, 1, 5, 7, 4, 4, 0, 6, 9, 1],
[0, 1, 5, 7, 4, 4, 0, 6, 9, 1],
[0, 1, 5, 7, 4, 4, 0, 6, 9, 1],
[0, 1, 5, 7, 4, 4, 0, 6, 9, 1],
[0, 1, 5, 7, 4, 4, 0, 6, 9, 1]])
In [857]: res[:,idx]
Out[857]:
array([[0, 0, 1, 1, 4, 4, 5, 6, 7, 9],
[0, 0, 1, 1, 4, 4, 5, 6, 7, 9],
[0, 0, 1, 1, 4, 4, 5, 6, 7, 9],
[0, 0, 1, 1, 4, 4, 5, 6, 7, 9],
[0, 0, 1, 1, 4, 4, 5, 6, 7, 9]])
On the time it takes to index a whole array:
In [860]: A = np.random.randint(0,1000,100000)
In [861]: idx = np.argsort(A)
In [862]: timeit A.copy()
31.8 µs ± 394 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [863]: timeit A[idx]
332 µs ± 9.69 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)