In the following code I have written 2 methods that theoretically(in my mind) should do the same thing. Unfortunately they don't, I am unable to find out why they don't do the same thing per the numpy documentation.
import numpy as np
dW = np.zeros((20, 10))
y = [1 for _ in range(100)]
X = np.ones((100, 20))
# ===================
# Method 1 (works!)
# ===================
for i in range(len(y)):
dW[:, y[i]] -= X[i]
# ===================
# Method 2 (does not work)
# ===================
dW[:, y] -= X.T
As indicated, in principle you cannot operate multiple times over the same element in a single operation, due to how buffering works in NumPy. For that purpose there is the at
function, which can be used on about any standard NumPy function (add
, subtract
, etc.). For your case, you can do:
import numpy as np
dW = np.zeros((20, 10))
y = [1 for _ in range(100)]
X = np.ones((100, 20))
# at modifies in place dW, does not return a new array
np.subtract.at(dW, (slice(None), y), X.T)
This is a column-wise version of this question.
The answer there can be adapted to work column-wise as follows:
Approach 1: np.<ufunc>.at
>>> np.subtract.at(dW, (slice(None), y), X.T)
Approach 2: np.bincount
>>> m, n = dW.shape
>>> dW -= np.bincount(np.add.outer(np.arange(m) * n, y).ravel(), (X.T).ravel(), dW.size).reshape(m, n)
Please note that the bincount
based solution - even though it involves more steps - is faster by a factor of ~6.
>>> from timeit import repeat
>>> kwds = dict(globals=globals(), number=5000)
>>>
>>> repeat('np.subtract.at(dW, (slice(None), y), X.T); np.add.at(dW, (slice(None), y), X.T)', **kwds)
[1.590626839082688, 1.5769231889862567, 1.5802007300080732]
>>> repeat('_= dW; _ -= np.bincount(np.add.outer(np.arange(m) * n, y).ravel(), (X.T).ravel(), dW.size).reshape(m, n); _ += np.bincount(np.add.outer(np.arange(m) * n, y).ravel(), (X.T).ravel(), dW.size).reshape(m, n)', **kwds)
[0.2582490430213511, 0.25572817400097847, 0.25478115503210574]
Option 1:
for i in range(len(y)):
dW[:, y[i]] -= X[i]
This works because you are looping through and updating value which was updated last time.
Option 2:
dW[:, [1,1,1,1,....1,1,1]] -= [[1,1,1,1...1],
[1,1,1,1...1],
.
.
[1,1,1,1...1]]
It does not work because update happens to 1st index at same time in parallel not in serial way. Initially all are 0 so subtracting results in -1.
I found a third solution for this problem. Normal Matrix multiplication:
ind = np.zeros((X.shape[0],dW.shape[1]))
ind[range(X.shape[0]),y] = -1
dW = X.T.dot(ind)
I did some experiment using the proposed methods above on some neural network data. In my example X.shape = (500,3073)
, W.shape = (3073,10)
and ind.shape = (500,10)
.
The subtract version takes about 0.2 second (slowest). The Matrix multiplication method 0.01 s (fastest). Normal loop 0.015 and then bincount
method 0.04 s. Note that in the question y
is vector of ones. This is not my case. The case with only ones can be solved with a simple sum.