I am trying to get a fast vectorized version of the following loop:
for i in xrange(N1):
A[y[i]] -= B[i,:]
Here A.shape = (N2,N3)
, y.shape = (N1)
with y
taking values in [0,N2[
, B.shape = (N1,N3)
. You can think of entries of y
being indices into rows of A
. Here N1
is large, N2
is pretty small and N3
is smallish.
I thought simply doing
A[y] -= B
would work, but the issue is that there are repeated entries in y
and this does not do the right thing (i.e., if y=[1,1]
then A[1]
is only added to once, not twice). Also this is does not seem to be any faster than the unvectorized for loop.
Is there a better way of doing this?
EDIT: YXD linked this answer to in comments which at first seems to fit the bill. It would seem you can do exactly what I want with
np.subtract.at(A, y, B)
and it does work, however when I try to run it it is significantly slower than the unvectorized version. So, the question remains: is there a more performant way of doing this?
EDIT2: An example, to make things concrete:
n1,n2,n3 = 10000, 10, 500
A = np.random.rand(n2,n3)
y = np.random.randint(n2, size=n1)
B = np.random.rand(n1,n3)
The for loop, when run using %timeit
in ipython gives on my machine:
10 loops, best of 3: 19.4 ms per loop
The subtract.at
version produces the same value for A
in the end, but is much slower:
1 loops, best of 3: 444 ms per loop