Append Row(s) to a NumPy Record Array

2020-05-27 12:13发布

问题:

Is there a way to append a row to a NumPy rec.array()? For example,

x1=np.array([1,2,3,4])
x2=np.array(['a','dd','xyz','12'])
x3=np.array([1.1,2,3,4])
r = np.core.records.fromarrays([x1,x2,x3],names='a,b,c')

append(r,(5,'cc',43.0),axis=0)

The easiest way would to extract all the column as nd.array() types, add the separate elements to each column, and then rebuild the rec.array(). This method would be memory inefficient unfortunately. Is there another way to this without separating the rebuilding the rec.array()?

Cheers,

Eli

回答1:

You can resize numpy arrays in-place. This is faster than converting to lists and then back to numpy arrays, and it uses less memory too.

print (r.shape)
# (4,)
r.resize(5)   
print (r.shape)
# (5,)
r[-1] = (5,'cc',43.0)
print(r)

# [(1, 'a', 1.1000000000000001) 
#  (2, 'dd', 2.0) 
#  (3, 'xyz', 3.0) 
#  (4, '12', 4.0)
#  (5, 'cc', 43.0)]

If there is not enough memory to expand an array in-place, the resizing (or appending) operation may force NumPy to allocate space for an entirely new array and copy the old data to the new location. That, naturally, is rather slow so you should try to avoid using resize or append if possible. Instead, pre-allocate arrays of sufficient size from the very beginning (even if somewhat larger than ultimately necessary).



回答2:

np.core.records.fromrecords(r.tolist()+[(5,'cc',43.)])

Still it does split, this time by rows. Maybe better?



回答3:

Extending @unutbu's answer I post a more general function that appends any number of rows:

def append_rows(arrayIN, NewRows):
    """Append rows to numpy recarray.

    Arguments:
      arrayIN: a numpy recarray that should be expanded
      NewRows: list of tuples with the same shape as `arrayIN`

    Idea: Resize recarray in-place if possible.
    (only for small arrays reasonable)

    >>> arrayIN = np.array([(1, 'a', 1.1), (2, 'dd', 2.0), (3, 'x', 3.0)],
                           dtype=[('a', '<i4'), ('b', '|S3'), ('c', '<f8')])
    >>> NewRows = [(4, '12', 4.0), (5, 'cc', 43.0)]
    >>> append_rows(arrayIN, NewRows)
    >>> print(arrayIN)
    [(1, 'a', 1.1) (2, 'dd', 2.0) (3, 'x', 3.0) (4, '12', 4.0) (5, 'cc', 43.0)]

    Source: http://stackoverflow.com/a/1731228/2062965
    """
    # Calculate the number of old and new rows
    len_arrayIN = arrayIN.shape[0]
    len_NewRows = len(NewRows)
    # Resize the old recarray
    arrayIN.resize(len_arrayIN + len_NewRows, refcheck=False)
    # Write to the end of recarray
    arrayIN[-len_NewRows:] = NewRows

Comment

I want to stress that pre-allocation of an array, which is at least big enough, is the most reasonable solution (if you have an idea about the final size of the array)! Pre-allocation also saves you a lot of time.



标签: python numpy