I am using numpy 1.16.2.
In brief, I am wondering how to add an object-type field to a structured array. The standard way via the recfunctions
module throws an error and I suppose there is a reason for this. Therefore, I wonder whether there is anything wrong with my workaround. Furthermore, I would like to understand why this workaround is necessary and whether I need to use extra caution when accessing the newly created array.
Now here come the details:
I have a numpy structured array:
import numpy as np
a = np.zeros(3, dtype={'names':['A','B','C'], 'formats':['int','int','float']})
for i in range(len(a)):
a[i] = i
I want to add another field "test" of type object
to the array a
. The standard way for doing this is using numpy's recfunctions
module:
import numpy.lib.recfunctions as rf
b = rf.append_fields(a, "test", [None]*len(a))
This code throws an error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-38-4a7be4f94686> in <module>
----> 1 rf.append_fields(a, "test", [None]*len(a))
D:\_Programme\Anaconda3\lib\site-packages\numpy\lib\recfunctions.py in append_fields(base, names, data, dtypes, fill_value, usemask, asrecarray)
718 if dtypes is None:
719 data = [np.array(a, copy=False, subok=True) for a in data]
--> 720 data = [a.view([(name, a.dtype)]) for (name, a) in zip(names, data)]
721 else:
722 if not isinstance(dtypes, (tuple, list)):
D:\_Programme\Anaconda3\lib\site-packages\numpy\lib\recfunctions.py in <listcomp>(.0)
718 if dtypes is None:
719 data = [np.array(a, copy=False, subok=True) for a in data]
--> 720 data = [a.view([(name, a.dtype)]) for (name, a) in zip(names, data)]
721 else:
722 if not isinstance(dtypes, (tuple, list)):
D:\_Programme\Anaconda3\lib\site-packages\numpy\core\_internal.py in _view_is_safe(oldtype, newtype)
492
493 if newtype.hasobject or oldtype.hasobject:
--> 494 raise TypeError("Cannot change data-type for object array.")
495 return
496
TypeError: Cannot change data-type for object array.
A similar error has been discussed here, though the issue is old and I do not know whether the behaviour I am observing is actually a bug. Here I am informed that views of structured arrays containing general objects are not supported.
I therefore built a workaround:
b = np.empty(len(a), dtype=a.dtype.descr+[("test", object)])
b[list(a.dtype.names)] = a
This works. Nonetheless, I have the following questions:
Questions
- Why is this workaround neccesary? Is this just a bug?
- Working with the new array
b
seems to be no different from working witha
. The variablec = b[["A", "test"]]
is clearly a view to the data ofb
. So why would they say that views on the arrayb
are not supported? Do I have to treatc
with extra caution?
define the new dtype:
new array of right size and dtype:
copy values from
a
tob
by field name:Many of the
rf
functions do this field by field copy:rf.append_fields
uses this after it initializes it'soutput
array.In earlier versions a multifield index produced a copy, so expressions like
b[list(a.dtype.names)] = a
would not work.I don't know if it's worth trying to figure out what
rf.append_fields
is doing. Those functions are somewhat old, and not heavily used (note the special import). So it's entirely likely that they have bugs, or edge cases , that don't work. The functions that I've examined function much as I demonstrated - make a new dtype, and result array, and copy data by field name.In recent releases there have been changes in how multiple fields are accessed. There are some new functions in
recfunctions
to facilitate working with structured arrays, such asrepack_fields
.https://docs.scipy.org/doc/numpy/user/basics.rec.html#accessing-multiple-fields
I don't know if any of that applies to the
append_fields
problem. I see there's also a section about structured arrays with objects, but I haven't studied that:https://docs.scipy.org/doc/numpy/user/basics.rec.html#viewing-structured-arrays-containing-objects
This line apparently refers to the use of
view
method. Views created by field indexing, whether single name or multifield lists, are not affected.The error in
append_fields
comes from this operation:There's no problem creating an compound dtype with object dtypes:
But I don't see any
recfunctions
that are capable of joininga
anddata
.view
can be used to change the field names ofa
:but trying to do so for
b
fails for the same reason:I start with a object dtype array, and try to
view
withi8
(same size dtype), I get this same error. So the restriction onview
of a object dtype isn't limited to structured arrays. The need for such a restriction in the case of object pointer toi8
makes sense. The need for such a restriction in the case of embedding the object pointer in a compound dtype might not be so compelling. It might even be overkill, or just a case of simply playing it safe and simple.Note that the test in line 493 checks the
hasobject
property of both the new and old dtypes. A more nuanced test might check if bothhasobject
, but I suspect the logic could get quite complex. Sometimes a simple prohibition is safer (and easier) a complex set of tests.In further testing
but trying to do the same on
b
, or even a subset of its fields produces the familiar error:I have to first use
repack
to make a object-less copy: