In the following code I have written 2 methods that theoretically(in my mind) should do the same thing. Unfortunately they don't, I am unable to find out why they don't do the same thing per the numpy documentation.
import numpy as np
dW = np.zeros((20, 10))
y = [1 for _ in range(100)]
X = np.ones((100, 20))
# ===================
# Method 1 (works!)
# ===================
for i in range(len(y)):
dW[:, y[i]] -= X[i]
# ===================
# Method 2 (does not work)
# ===================
dW[:, y] -= X.T
This is a column-wise version of this question.
The answer there can be adapted to work column-wise as follows:
Approach 1:
np.<ufunc>.at
Approach 2:
np.bincount
Please note that the
bincount
based solution - even though it involves more steps - is faster by a factor of ~6.I found a third solution for this problem. Normal Matrix multiplication:
I did some experiment using the proposed methods above on some neural network data. In my example
X.shape = (500,3073)
,W.shape = (3073,10)
andind.shape = (500,10)
.The subtract version takes about 0.2 second (slowest). The Matrix multiplication method 0.01 s (fastest). Normal loop 0.015 and then
bincount
method 0.04 s. Note that in the questiony
is vector of ones. This is not my case. The case with only ones can be solved with a simple sum.Option 1:
This works because you are looping through and updating value which was updated last time.
Option 2:
It does not work because update happens to 1st index at same time in parallel not in serial way. Initially all are 0 so subtracting results in -1.
As indicated, in principle you cannot operate multiple times over the same element in a single operation, due to how buffering works in NumPy. For that purpose there is the
at
function, which can be used on about any standard NumPy function (add
,subtract
, etc.). For your case, you can do: