Numpy structured array fails basic numpy operation

2020-03-26 04:09发布

问题:

I wish to manipulate named numpy arrays (add, multiply, concatenate, ...)

I defined structured arrays:

types=[('name1', int), ('name2', float)]
a = np.array([2, 3.3], dtype=types)
b = np.array([4, 5.35], dtype=types)

a and b are created such that

a
array([(2, 2. ), (3, 3.3)], dtype=[('name1', '<i8'), ('name2', '<f8')])

but I really want a['name1'] to be just 2, not array([2, 3])

Similarly, I want a['name2'] to be just 3.3

This way I could sum c=a+b, which is expected to be an array of length 2, where c['name1'] is 6 and c['name2'] is 8.65

How can I do that?

回答1:

Define a structured array:

In [125]: dt = np.dtype([('f0','U10'),('f1',int),('f2',float)])
In [126]: a = np.array([('one',2,3),('two',4,5.5),('three',6,7)],dt)
In [127]: a
Out[127]: 
array([('one', 2, 3. ), ('two', 4, 5.5), ('three', 6, 7. )],
      dtype=[('f0', '<U10'), ('f1', '<i8'), ('f2', '<f8')])

And an object dtype array with the same data

In [128]: A = np.array([('one',2,3),('two',4,5.5),('three',6,7)],object)
In [129]: A
Out[129]: 
array([['one', 2, 3],
       ['two', 4, 5.5],
       ['three', 6, 7]], dtype=object)

Addition works because it (iteratively) delegates the action to all elements

In [130]: A+A
Out[130]: 
array([['oneone', 4, 6],
       ['twotwo', 8, 11.0],
       ['threethree', 12, 14]], dtype=object)

structured addition does not work

In [131]: a+a
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-131-6ff992d1ddd5> in <module>()
----> 1 a+a

TypeError: ufunc 'add' did not contain a loop with signature matching types 
dtype([('f0', '<U10'), ('f1', '<i8'), ('f2', '<f8')]) dtype([('f0', '<U10'), ('f1', '<i8'), ('f2', '<f8')]) 
dtype([('f0', '<U10'), ('f1', '<i8'), ('f2', '<f8')])

Lets try addition field by field:

In [132]: aa = np.zeros_like(a)
In [133]: for n in a.dtype.names: aa[n] = a[n] + a[n]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-133-68476e5d579e> in <module>()
----> 1 for n in a.dtype.names: aa[n] = a[n] + a[n]

TypeError: ufunc 'add' did not contain a loop with signature matching types 
dtype('<U10') dtype('<U10') dtype('<U10')

Oops, doesn't quite work - string dtype doesn't have addition. But we can handle the string field separately:

In [134]: aa['f0'] = a['f0']
In [135]: for n in a.dtype.names[1:]: aa[n] = a[n] + a[n]
In [136]: aa
Out[136]: 
array([('one',  4,  6.), ('two',  8, 11.), ('three', 12, 14.)],
      dtype=[('f0', '<U10'), ('f1', '<i8'), ('f2', '<f8')])

Or we can change the string dtype to object:

In [137]: dt1 = np.dtype([('f0',object),('f1',int),('f2',float)])
In [138]: b = np.array([('one',2,3),('two',4,5.5),('three',6,7)],dt1)
In [139]: b
Out[139]: 
array([('one', 2, 3. ), ('two', 4, 5.5), ('three', 6, 7. )],
      dtype=[('f0', 'O'), ('f1', '<i8'), ('f2', '<f8')])
In [140]: bb = np.zeros_like(b)
In [141]: for n in a.dtype.names: bb[n] = b[n] + b[n]
In [142]: bb
Out[142]: 
array([('oneone',  4,  6.), ('twotwo',  8, 11.), ('threethree', 12, 14.)],
      dtype=[('f0', 'O'), ('f1', '<i8'), ('f2', '<f8')])

Python strings do have a __add__, defined as concatenate. Numpy dtype strings don't have that definition. Python strings can be multiplied by an integer, but raise an error otherwise.

My guess is that pandas resorts to something like what I just did. I doubt if it implements dataframe addition in compiled code (except in some special cases). It probably works column by column if the dtype allows. It also seems to freely switch to object dtype (for example a column with both np.nan and a string). Timings might confirm my guess (I don't have pandas installed on this OS).



回答2:

According to the documentation, the right way to make your arrays is:

types=[('name1', int), ('name2', float)]
a = np.array([(2, 3.3)], dtype=types)
b = np.array([(4, 5.35)], dtype=types)

Which gives generates a and b as you want them:

a['name1']
array([2])

But summing them is not as straight forward as the conventional numpy arrays, so I also suggest to use pandas:

names=['name1','name2']
a=pd.Series([2,3.3],index=names)
b=pd.Series([4,5.35],index=names)
a+b
name1    6.00
name2    8.65
dtype: float64