Numpy “:” operator broadcasting issues

2019-07-09 00:58发布

In the following code I have written 2 methods that theoretically(in my mind) should do the same thing. Unfortunately they don't, I am unable to find out why they don't do the same thing per the numpy documentation.

import numpy as np


dW = np.zeros((20, 10))
y = [1 for _ in range(100)]
X =  np.ones((100, 20))

# ===================
# Method 1  (works!)
# ===================
for i in range(len(y)):
  dW[:, y[i]] -=  X[i]


# ===================
# Method 2 (does not work)
# ===================
dW[:, y] -=  X.T

4条回答
\"骚年 ilove
2楼-- · 2019-07-09 01:38

This is a column-wise version of this question.

The answer there can be adapted to work column-wise as follows:

Approach 1: np.<ufunc>.at

>>> np.subtract.at(dW, (slice(None), y), X.T)

Approach 2: np.bincount

>>> m, n = dW.shape
>>> dW -= np.bincount(np.add.outer(np.arange(m) * n, y).ravel(), (X.T).ravel(), dW.size).reshape(m, n)

Please note that the bincount based solution - even though it involves more steps - is faster by a factor of ~6.

>>> from timeit import repeat
>>> kwds = dict(globals=globals(), number=5000)
>>>
>>> repeat('np.subtract.at(dW, (slice(None), y), X.T); np.add.at(dW, (slice(None), y), X.T)', **kwds)
[1.590626839082688, 1.5769231889862567, 1.5802007300080732]
>>> repeat('_= dW; _ -= np.bincount(np.add.outer(np.arange(m) * n, y).ravel(), (X.T).ravel(), dW.size).reshape(m, n); _ += np.bincount(np.add.outer(np.arange(m) * n, y).ravel(), (X.T).ravel(), dW.size).reshape(m, n)', **kwds)
[0.2582490430213511, 0.25572817400097847, 0.25478115503210574]
查看更多
贼婆χ
3楼-- · 2019-07-09 01:40

I found a third solution for this problem. Normal Matrix multiplication:

ind = np.zeros((X.shape[0],dW.shape[1]))
ind[range(X.shape[0]),y] = -1
dW = X.T.dot(ind)

I did some experiment using the proposed methods above on some neural network data. In my example X.shape = (500,3073), W.shape = (3073,10) and ind.shape = (500,10).

The subtract version takes about 0.2 second (slowest). The Matrix multiplication method 0.01 s (fastest). Normal loop 0.015 and then bincountmethod 0.04 s. Note that in the question y is vector of ones. This is not my case. The case with only ones can be solved with a simple sum.

查看更多
Evening l夕情丶
4楼-- · 2019-07-09 01:46

Option 1:

for i in range(len(y)):
  dW[:, y[i]] -=  X[i]

This works because you are looping through and updating value which was updated last time.

Option 2:

dW[:, [1,1,1,1,....1,1,1]] -=  [[1,1,1,1...1],
                                [1,1,1,1...1],
                                .
                                .
                                [1,1,1,1...1]]

It does not work because update happens to 1st index at same time in parallel not in serial way. Initially all are 0 so subtracting results in -1.

查看更多
唯我独甜
5楼-- · 2019-07-09 01:49

As indicated, in principle you cannot operate multiple times over the same element in a single operation, due to how buffering works in NumPy. For that purpose there is the at function, which can be used on about any standard NumPy function (add, subtract, etc.). For your case, you can do:

import numpy as np

dW = np.zeros((20, 10))
y = [1 for _ in range(100)]
X =  np.ones((100, 20))
# at modifies in place dW, does not return a new array
np.subtract.at(dW, (slice(None), y), X.T)
查看更多
登录 后发表回答