Confusion about numpy's apply along axis and l

2019-06-08 00:42发布

问题:

Alright, so I apologize ahead of time if I'm just asking something silly, but I really thought I understood how apply_along_axis worked. I just ran into something that might be an edge case that I just didn't consider, but it's baffling me. In short, this is the code that is confusing me:

class Leaf(object):

    def __init__(self, location):
        self.location = location

    def __len__(self):
        return self.location.shape[0]

def bulk_leaves(child_array, axis=0):
    test = np.array([Leaf(location) for location in child_array])  # This is what I want
    check = np.apply_along_axis(Leaf, 0, child_array)  # This returns an array of individual leafs with the same shape as child_array
    return test, check

if __name__ == "__main__":
    test, check = bulk_leaves(np.random.ran(100, 50))
    test == check  # False

I always feel silly using a list comprehension with numpy and then casting back to an array, but I'm just nor sure of another way to do this. Am I just missing something obvious?

回答1:

The problem seems to be that apply_along_axis uses isscalar to determine whether the returned object is a scalar, but isscalar returns False for user-defined classes. The documentation for apply_along_axis says:

The shape of outarr is identical to the shape of arr, except along the axis dimension, where the length of outarr is equal to the size of the return value of func1d.

Since your class's __len__ returns the length of the array it wraps, numpy "expands" the resulting array into the original shape. If you don't define a __len__, you'll get an error, because numpy doesn't think user-defined types are scalars, so it will still try to call len on it.

As far as I can see, there is no way to make this work with a user-defined class. You can return 1 from __len__, but then you'll still get an Nx1 2D result, not a 1D array of length N. I don't see any way to make Numpy see a user-defined instance as a scalar.

There is a numpy bug about the apply_along_axis behavior, but surprisingly I can't find any discussion of the underlying issue that isscalar returns False for non-numpy objects. It may be that numpy just decided to punt and not guess whether user-defined types are vector or scalar. Still, it might be worth asking about this on the numpy list, as it seems odd to me that things like isscalar(object()) return False.

However, if as you say you don't care about performance anyway, it doesn't really matter. Just use your first way with the list comprehension, which already does what you want.



回答2:

The apply_along_axis is pure Python that you can look at and decode yourself. In this case it essentially does:

check = np.empty(child_array.shape,dtype=object)
for i in range(child_array.shape[1]):
    check[:,i] = Leaf(child_array[:,i])

In other words, it preallocates the container array, and then fills in the values with an iteration. That certainly is better than appending to the array, but rarely better than appending values to a list (which is what the comprehension is doing).

You could take the above template and adjust it to produce the array that you really want.

for i in range(check.shape[0]):
    check[i]=Leaf(child_array[i,:])

In quick tests this iteration times the same as the comprehension. The apply_along_axis, besides being wrong, is slower.