How to keep numpy from broadcasting when creating

2019-01-20 02:38发布

问题:

I try to store a list of different shaped arrays as a dtype=object array using np.save (I'm aware I could just pickle the list but I'm really curious how to do this). If I do this:

import numpy as np
np.save('test.npy', [np.zeros((2, 2)), np.zeros((3,3))])

it works. But this:

np.save('test.npy', [np.zeros((2, 2)), np.zeros((2,3))])

Gives me an error:

ValueError: could not broadcast input array from shape (2,2) into shape (2)

I guess np.save converts the list into an array first, so I tried:

x=np.array([np.zeros((2, 2)), np.zeros((3,3))])
y=np.array([np.zeros((2, 2)), np.zeros((2,3))])

Which has the same effect (first one works, second one doesn't. The resulting x behaves as expected:

>>> x.shape
(2,)
>>> x.dtype
dtype('O')
>>> x[0].shape
(2, 2)
>>> x[0].dtype
dtype('float64')

I also tried to force the 'object' dtype:

np.array([np.zeros((2, 2)), np.zeros((2,3))], dtype=object)

Without success. It seems numpy tries to broadcast the array with equal first dimension into the new array and realizes too late that their shape is different. Oddly it seems to have worked at one point - so I'm really curious what the difference is, and how to do this properly.


EDIT: I figured out the case it worked before: The only difference seems to be that the numpy arrays in the list have another data type. It works with dtype('<f8'), but it doesn't with dtype('float64'), I'm not even sure what the difference is.


EDIT 2: I found a very non-pythonic way to solve my issue, I add it here, maybe it helps to understand what I wanted to do:

array_list=np.array([np.zeros((2, 2)), np.zeros((2,3))])
save_array = np.empty((len(array_list),), dtype=object)
for idx, arr in enumerate(array_list):
    save_array[idx] = arr
np.save('test.npy', save_array)

回答1:

One of the first things that np.save does is

arr = np.asanyarray(arr)

So yes it is trying to turn your list into an array.

Constructing an object array from arbitrary sized arrays or lists is tricky. np.array(...) tries to create as high a dimensional array as it can, even attempting to concatenate the inputs if possible. The surest way is to do what you did - make the empty array and fill it.

A slightly more compact way of constructing the object array:

In [21]: alist = [np.zeros((2, 2)), np.zeros((2,3))]
In [22]: arr = np.empty(len(alist), dtype=object)
In [23]: arr[:] = alist
In [24]: arr
Out[24]: 
array([array([[ 0.,  0.],
       [ 0.,  0.]]),
       array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])], dtype=object)

Here are 3 scenarios:

Arrays that match in shape, combine into a 3d array:

In [27]: np.array([np.zeros((2, 2)), np.zeros((2,2))])
Out[27]: 
array([[[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]]])
In [28]: _.shape
Out[28]: (2, 2, 2)

Arrays that don't match on the first dimension - create object array

In [29]: np.array([np.zeros((2, 2)), np.zeros((3,2))])
Out[29]: 
array([array([[ 0.,  0.],
       [ 0.,  0.]]),
       array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])], dtype=object)
In [30]: _.shape
Out[30]: (2,)

And awkward intermediate case (which may even be described as a bug). The first dimensions match, but the second ones don't):

In [31]: np.array([np.zeros((2, 2)), np.zeros((2,3))])
...
ValueError: could not broadcast input array from shape (2,2) into shape (2)
       [ 0.,  0.]])], dtype=object)

It's as though it initialized a (2,2,2) array, and then found that the (2,3) wouldn't fit. And the current logic doesn't allow it to backup and create the object array as it did in the previous scenario.

If you wanted to put the two (2,2) arrays in object array you'd have to use the create and fill logic.