numpy recarray append_fields: can't append num

2019-07-10 16:19发布

问题:

I have a recarray containing various fields and I want to append an array of datetime objects on to it.

However, it seems like the append_fields function in numpy.lib.recfunctions won't let me add an array of objects.

Here's some example code:

import numpy as np
import datetime
import numpy.lib.recfunctions as recfun

dtype= np.dtype([('WIND_WAVE_HGHT', '<f4'), ('WIND_WAVE_PERD', '<f4')])
obs = np.array([(0.1,10.0),(0.2,11.0),(0.3,12.0)], dtype=dtype)

dates = np.array([datetime.datetime(2001,1,1,0),
    datetime.datetime(2001,1,1,0),
    datetime.datetime(2001,1,1,0)])

# This doesn't work:
recfun.append_fields(obs,'obdate',dates,dtypes=np.object)

I keep getting the error TypeError: Cannot change data-type for object array.

It seems to only be an issue with np.object arrays as I can append other fields ok. Am I missing something?

回答1:

The problem

In [143]: recfun.append_fields(obs,'test',np.array([None,[],1]))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-143-5c3de23b09f7> in <module>()
----> 1 recfun.append_fields(obs,'test',np.array([None,[],1]))

/usr/local/lib/python3.5/dist-packages/numpy/lib/recfunctions.py in append_fields(base, names, data, dtypes, fill_value, usemask, asrecarray)
    615     if dtypes is None:
    616         data = [np.array(a, copy=False, subok=True) for a in data]
--> 617         data = [a.view([(name, a.dtype)]) for (name, a) in zip(names, data)]
    618     else:
    619         if not isinstance(dtypes, (tuple, list)):

/usr/local/lib/python3.5/dist-packages/numpy/lib/recfunctions.py in <listcomp>(.0)
    615     if dtypes is None:
    616         data = [np.array(a, copy=False, subok=True) for a in data]
--> 617         data = [a.view([(name, a.dtype)]) for (name, a) in zip(names, data)]
    618     else:
    619         if not isinstance(dtypes, (tuple, list)):

/usr/local/lib/python3.5/dist-packages/numpy/core/_internal.py in _view_is_safe(oldtype, newtype)
    363 
    364     if newtype.hasobject or oldtype.hasobject:
--> 365         raise TypeError("Cannot change data-type for object array.")
    366     return
    367 

TypeError: Cannot change data-type for object array.

So the problem is in this a.view([(name, a.dtype)]) expression. It tries to make a single field structured array from a. That works with dtypes like int and str, but fails with object. That failure is in the core view handling, so isn't likely to change.

In [148]: x=np.arange(3)

In [149]: x.view([('test', x.dtype)])
Out[149]: 
array([(0,), (1,), (2,)], 
      dtype=[('test', '<i4')])

In [150]: x=np.array(['one','two'])

In [151]: x.view([('test', x.dtype)])
Out[151]: 
array([('one',), ('two',)], 
      dtype=[('test', '<U3')])

In [152]: x=np.array([[1],[1,2]])

In [153]: x
Out[153]: array([[1], [1, 2]], dtype=object)

In [154]: x.view([('test', x.dtype)])
...
TypeError: Cannot change data-type for object array.

The fact that recfunctions requires a separate load indicates that it is somewhat of a backwater, that isn't used a lot, and not under active development. I haven't examined the code in detail, but I suspect a fix would be a kludge.

A fix

Here's a way of adding a new field from scratch. It performs the same basic actions as append_fields:

Define a new dtype, using the obs and the new field name and dtype:

In [158]: obs.dtype.descr
Out[158]: [('WIND_WAVE_HGHT', '<f4'), ('WIND_WAVE_PERD', '<f4')]

In [159]: obs.dtype.descr+[('TEST',object)]
Out[159]: [('WIND_WAVE_HGHT', '<f4'), ('WIND_WAVE_PERD', '<f4'), ('TEST', object)]

In [160]: dt1  =np.dtype(obs.dtype.descr+[('TEST',object)])

Make an empty target array, and fill it by copying data by field name:

In [161]: newobs = np.empty(obs.shape, dtype=dt1)    
In [162]: for n in obs.dtype.names:
     ...:     newobs[n]=obs[n]

In [167]: dates
Out[167]: 
array([datetime.datetime(2001, 1, 1, 0, 0),
       datetime.datetime(2001, 1, 1, 0, 0),
       datetime.datetime(2001, 1, 1, 0, 0)], dtype=object)

In [168]: newobs['TEST']=dates

In [169]: newobs
Out[169]: 
array([( 0.1       ,  10., datetime.datetime(2001, 1, 1, 0, 0)),
       ( 0.2       ,  11., datetime.datetime(2001, 1, 1, 0, 0)),
       ( 0.30000001,  12., datetime.datetime(2001, 1, 1, 0, 0))], 
      dtype=[('WIND_WAVE_HGHT', '<f4'), ('WIND_WAVE_PERD', '<f4'), ('TEST', 'O')])

datetime64 alternative

With the native numpy datetimes, append works

In [179]: dates64 = dates.astype('datetime64[D]')

In [180]: recfun.append_fields(obs,'test',dates64,usemask=False)
Out[180]: 
array([( 0.1       ,  10., '2001-01-01'),
       ( 0.2       ,  11., '2001-01-01'), ( 0.30000001,  12., '2001-01-01')], 
      dtype=[('WIND_WAVE_HGHT', '<f4'), ('WIND_WAVE_PERD', '<f4'), ('test', '<M8[D]')])

append_fields has some bells-n-whistles that my version doesn't - fill values, masked arrays, recarray, etc.

structured dates array

I could create a structured array with the dates

In [197]: sdates = np.array([(i,) for i in dates],dtype=[('test',object)])
In [198]: sdates
Out[198]: 
array([(datetime.datetime(2001, 1, 1, 0, 0),),
       (datetime.datetime(2001, 1, 1, 0, 0),),
       (datetime.datetime(2001, 1, 1, 0, 0),)], 
      dtype=[('test', 'O')])

There must be a function that merges fields of existing arrays, but I'm not finding it.

previous work

This felt familiar:

https://github.com/numpy/numpy/issues/2346

TypeError when appending fields to a structured array of size ONE

Adding datetime field to recarray