Numpy “:” operator broadcasting issues

In the following code I have written 2 methods that theoretically(in my mind) should do the same thing. Unfortunately they don't, I am unable to find out why they don't do the same thing per the numpy documentation.

import numpy as np


dW = np.zeros((20, 10))
y = [1 for _ in range(100)]
X =  np.ones((100, 20))

# ===================
# Method 1  (works!)
# ===================
for i in range(len(y)):
  dW[:, y[i]] -=  X[i]


# ===================
# Method 2 (does not work)
# ===================
dW[:, y] -=  X.T

标签： python numpy numpy-broadcasting

4条回答

\"骚年 ilove

2楼-- · 2019-07-09 01:38

This is a column-wise version of this question.

The answer there can be adapted to work column-wise as follows:

Approach 1: np.<ufunc>.at

>>> np.subtract.at(dW, (slice(None), y), X.T)

Approach 2: np.bincount

>>> m, n = dW.shape
>>> dW -= np.bincount(np.add.outer(np.arange(m) * n, y).ravel(), (X.T).ravel(), dW.size).reshape(m, n)

Please note that the bincount based solution - even though it involves more steps - is faster by a factor of ~6.

>>> from timeit import repeat
>>> kwds = dict(globals=globals(), number=5000)
>>>
>>> repeat('np.subtract.at(dW, (slice(None), y), X.T); np.add.at(dW, (slice(None), y), X.T)', **kwds)
[1.590626839082688, 1.5769231889862567, 1.5802007300080732]
>>> repeat('_= dW; _ -= np.bincount(np.add.outer(np.arange(m) * n, y).ravel(), (X.T).ravel(), dW.size).reshape(m, n); _ += np.bincount(np.add.outer(np.arange(m) * n, y).ravel(), (X.T).ravel(), dW.size).reshape(m, n)', **kwds)
[0.2582490430213511, 0.25572817400097847, 0.25478115503210574]

0人赞添加讨论(0) 举报

贼婆χ

3楼-- · 2019-07-09 01:40

I found a third solution for this problem. Normal Matrix multiplication:

ind = np.zeros((X.shape[0],dW.shape[1]))
ind[range(X.shape[0]),y] = -1
dW = X.T.dot(ind)

I did some experiment using the proposed methods above on some neural network data. In my example X.shape = (500,3073), W.shape = (3073,10) and ind.shape = (500,10).

The subtract version takes about 0.2 second (slowest). The Matrix multiplication method 0.01 s (fastest). Normal loop 0.015 and then bincountmethod 0.04 s. Note that in the question y is vector of ones. This is not my case. The case with only ones can be solved with a simple sum.

0人赞添加讨论(0) 举报

Evening l夕情丶

4楼-- · 2019-07-09 01:46

Option 1:

for i in range(len(y)):
  dW[:, y[i]] -=  X[i]

This works because you are looping through and updating value which was updated last time.

Option 2:

dW[:, [1,1,1,1,....1,1,1]] -=  [[1,1,1,1...1],
                                [1,1,1,1...1],
                                .
                                .
                                [1,1,1,1...1]]

It does not work because update happens to 1st index at same time in parallel not in serial way. Initially all are 0 so subtracting results in -1.

0人赞添加讨论(0) 举报

唯我独甜

5楼-- · 2019-07-09 01:49

As indicated, in principle you cannot operate multiple times over the same element in a single operation, due to how buffering works in NumPy. For that purpose there is the at function, which can be used on about any standard NumPy function (add, subtract, etc.). For your case, you can do:

import numpy as np

dW = np.zeros((20, 10))
y = [1 for _ in range(100)]
X =  np.ones((100, 20))
# at modifies in place dW, does not return a new array
np.subtract.at(dW, (slice(None), y), X.T)

0人赞添加讨论(0) 举报

Numpy “:” operator broadcasting issues

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间