I have this piece of code in Python
for i in range(len(ax)):
for j in range(len(rx)):
x = ax[i] + rx[j]
y = ay[i] + ry[j]
A[x,y] = A[x,y] + 1
where
A.shape = (N,M)
ax.shape = ay.shape = (L)
rx.shape = ry.shape = (K)
I wanted to vectorize or otherwise make it more efficient, i.e. faster, and if possible more economical in memory consumption. Here, my ax and ay refer to the absolute elements of an array A, while rx and ay are relative coordinates. So, I'm updating the counter array A.
My table A can be 1000x1000, while ax,ay are 100x1 and cx,cy are 300x1. The whole thing's inside the loop, preferably the optimized code doesn't keep creating big tables of A size.
This question is related to the one I asked before, but it's not directly applicable to this situation due to the way increment works. Here's an example.
This code does exactly what I want:
import numpy as np
A = np.zeros((4,5))
ax = np.arange(1,3)
ay = np.array([1,1])
rx = np.array([-1,0,0])
ry = np.array([0,0,0])
for i in range(len(ax)):
for j in range(len(rx)):
x = ax[i] + rx[j]
y = ay[i] + ry[j]
print(x,y)
A[x,y] = A[x,y] + 1
A
array([[ 0., 1., 0., 0., 0.],
[ 0., 3., 0., 0., 0.],
[ 0., 2., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
However, the following code doesn't work, because when we're incrementing an array, it pre-calculates the right side with the array:
import numpy as np
A = np.zeros((4,5))
ax = np.arange(1,3)
ay = np.array([1,1])
rx = np.array([-1,0])
ry = np.array([0,0])
x = ax + rx[:,np.newaxis]
y = ay + ry[:,np.newaxis]
A[x,y] = A[x,y] + 1
A
array([[ 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
This solution works in terms of the correctness of numbers, but it's not the fastest, probably, because of np.add.at() function is not buffered:
import numpy as np
A = np.zeros((4,5))
ax = np.arange(1,3)
ay = np.array([1,1])
rx = np.array([-1,0,0])
ry = np.array([0,0,0])
x = ax + rx[:,np.newaxis]
y = ay + ry[:,np.newaxis]
np.add.at(A,[x,y],1)
A
Here's one leveraging
broadcasting
, getting linear indices, which are then fed to the very efficientnp.bincount
for binned summations -Alternative one with
np.flatnonzero
-If you are adding into
A
iteratively, replace=
with+=
there at the last step.