Confusion about numpy's apply along axis and l

Alright, so I apologize ahead of time if I'm just asking something silly, but I really thought I understood how apply_along_axis worked. I just ran into something that might be an edge case that I just didn't consider, but it's baffling me. In short, this is the code that is confusing me:

class Leaf(object):

    def __init__(self, location):
        self.location = location

    def __len__(self):
        return self.location.shape[0]

def bulk_leaves(child_array, axis=0):
    test = np.array([Leaf(location) for location in child_array])  # This is what I want
    check = np.apply_along_axis(Leaf, 0, child_array)  # This returns an array of individual leafs with the same shape as child_array
    return test, check

if __name__ == "__main__":
    test, check = bulk_leaves(np.random.ran(100, 50))
    test == check  # False

I always feel silly using a list comprehension with numpy and then casting back to an array, but I'm just nor sure of another way to do this. Am I just missing something obvious?

标签： python object numpy vectorization

2条回答

劫难

2楼-- · 2019-06-08 00:50

The apply_along_axis is pure Python that you can look at and decode yourself. In this case it essentially does:

check = np.empty(child_array.shape,dtype=object)
for i in range(child_array.shape[1]):
    check[:,i] = Leaf(child_array[:,i])

In other words, it preallocates the container array, and then fills in the values with an iteration. That certainly is better than appending to the array, but rarely better than appending values to a list (which is what the comprehension is doing).

You could take the above template and adjust it to produce the array that you really want.

for i in range(check.shape[0]):
    check[i]=Leaf(child_array[i,:])

In quick tests this iteration times the same as the comprehension. The apply_along_axis, besides being wrong, is slower.

0人赞添加讨论(0) 举报

劳资没心，怎么记你

3楼-- · 2019-06-08 01:05

The problem seems to be that apply_along_axis uses isscalar to determine whether the returned object is a scalar, but isscalar returns False for user-defined classes. The documentation for apply_along_axis says:

The shape of outarr is identical to the shape of arr, except along the axis dimension, where the length of outarr is equal to the size of the return value of func1d.

Since your class's __len__ returns the length of the array it wraps, numpy "expands" the resulting array into the original shape. If you don't define a __len__, you'll get an error, because numpy doesn't think user-defined types are scalars, so it will still try to call len on it.

As far as I can see, there is no way to make this work with a user-defined class. You can return 1 from __len__, but then you'll still get an Nx1 2D result, not a 1D array of length N. I don't see any way to make Numpy see a user-defined instance as a scalar.

There is a numpy bug about the apply_along_axis behavior, but surprisingly I can't find any discussion of the underlying issue that isscalar returns False for non-numpy objects. It may be that numpy just decided to punt and not guess whether user-defined types are vector or scalar. Still, it might be worth asking about this on the numpy list, as it seems odd to me that things like isscalar(object()) return False.

However, if as you say you don't care about performance anyway, it doesn't really matter. Just use your first way with the list comprehension, which already does what you want.

0人赞添加讨论(0) 举报

Confusion about numpy's apply along axis and l

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间