How to populate an existing numpy array with speci

2020-04-18 05:04发布

问题:

Let's say that I have this initial numpy array with some fixed dtype:

array = numpy.array([(1, 'a'), (2, 'b')],
                    numpy.dtype([('idfield',numpy.int32),
                                 ('textfield', '|S256')]))

Now I need to populate this array in a for loop so I do that

for val in value:
    array = np.append(array, np.array([(val[0],val[1])],numpy.dtype([('idfield',numpy.int32),
                                                                     ('textfield', '|S256')])),axis=0)

It works but it really doesn't looks good ! I need to re-specified the dtype in the for loop, even if it's logic that i'm going to use the same dtype to populate my array.

Do you know a simpler way to achieve this operation ?

回答1:

np.append is a simple cover to np.concatenate

def append(arr, values, axis=None):
    arr = asanyarray(arr)
    if axis is None:
        if arr.ndim != 1:
            arr = arr.ravel()
        values = ravel(values)
        axis = arr.ndim-1
    return concatenate((arr, values), axis=axis)

In [89]: dt = np.dtype('U5,int')
In [90]: arr = np.array([('one',1)], dtype=dt)
In [91]: np.append(arr, ('two',2))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-91-bc17d9ad4a77> in <module>()
----> 1 np.append(arr, ('two',2))
 ...
-> 5166     return concatenate((arr, values), axis=axis)

TypeError: invalid type promotion

In this case it does

In [92]: np.ravel(('two',2))
Out[92]: array(['two', '2'], dtype='<U3')

turning the tuple into a 2 element string dtype array. Now concatenate tries to join an array of dt with the U3 array, and it can't. There's nothing in the append uses the arr.dtype as the basis for turning values into an array. You need to do that yourself. numpy can only do so much to infer you intentions. :)

So if you specify common the dtype it works:

In [93]: np.append(arr, np.array(('two',2),dt))
Out[93]: array([('one', 1), ('two', 2)], dtype=[('f0', '<U5'), ('f1', '<i4')])

I dislike append because new users often misuse it. Usually they think of it as a list append clone, which it is not.

But it does have one advantage - it promotes the dimensions of 0d inputs:

In [94]: np.concatenate([arr, np.array(('two',2),dt)])
...
ValueError: all the input arrays must have same number of dimensions

Making the 2nd array 1d works:

In [95]: np.concatenate([arr, np.array([('two',2)],dt)])
Out[95]: array([('one', 1), ('two', 2)], dtype=[('f0', '<U5'), ('f1', '<i4')])

append hides the dimensional adjustment that concatenate needs.

But where possible it is better to create a list of arrays (or tuples) and do concatenate just once:

In [96]: alist = [('one',1),('two',2),('three',3)]
In [97]: ll = [np.array([x],dt) for x in alist]
In [98]: ll
Out[98]: 
[array([('one', 1)], dtype=[('f0', '<U5'), ('f1', '<i4')]),
 array([('two', 2)], dtype=[('f0', '<U5'), ('f1', '<i4')]),
 array([('three', 3)], dtype=[('f0', '<U5'), ('f1', '<i4')])]

In [100]: np.concatenate(ll)
Out[100]: 
array([('one', 1), ('two', 2), ('three', 3)],
      dtype=[('f0', '<U5'), ('f1', '<i4')])

But making the array directly from a list of tuples is even better:

In [101]: np.array(alist, dt)
Out[101]: 
array([('one', 1), ('two', 2), ('three', 3)],
      dtype=[('f0', '<U5'), ('f1', '<i4')])


回答2:

Like @juanpa.arrivillaga commented, it's cleaner to define your dtype only once:

array_dt = np.dtype([
    ('idfield', np.int32),
    ('textfield', '|S256')
])

Then define your second list of values as an array and then concatenate

array2 = np.array(value, array_dt)                                     
array = np.concatenate([array, array2])