NumPy recfunctions join_by TypeError

2019-07-10 00:11发布

问题:

I encounter a TypeError when I attempt to join a 'uint16' field to a structured array in NumPy 1.11 or 1.12 (Python 3.5).

import numpy as np
from numpy.lib import recfunctions as rfn
foo = np.array([(1,)],
               dtype=[('key', int)])
bar = np.array([(1,np.array([1,2,3]))],
               dtype=[('key', int), ('value', 'uint16', 3)])
rfn.join_by('key', foo, bar)

This is the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/anaconda3/lib/python3.5/site-packages/numpy/lib/recfunctions.py", line 986, in join_by
    output.sort(order=key)
  File "/home/user/anaconda3/lib/python3.5/site-packages/numpy/ma/core.py", line 5420, in sort
    sidx = self.filled(filler).argsort(axis=axis, kind=kind,
  File "/home/user/anaconda3/lib/python3.5/site-packages/numpy/ma/core.py", line 3668, in filled
    fill_value = _check_fill_value(fill_value, self.dtype)
  File "/home/user/anaconda3/lib/python3.5/site-packages/numpy/ma/core.py", line 470, in _check_fill_value
    fill_value = np.array(_recursive_set_fill_value(fill_value, ndtype),
  File "/home/user/anaconda3/lib/python3.5/site-packages/numpy/ma/core.py", line 436, in _recursive_set_fill_value
    output_value.append(np.array(fval, dtype=cdtype).item())
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

The same problem does not occur if I use a 'float16'.

import numpy as np
from numpy.lib import recfunctions as rfn
foo = np.array([(1,)],
               dtype=[('key', int)])
bar = np.array([(1,np.array([1,2,3]))],
               dtype=[('key', int), ('value', 'float16', 3)])
rfn.join_by('key', foo, bar)

Is this just a bug? Or is there some way to prevent this problem?

回答1:

This is a bug. This PR partially fixes it, but it seems you've stumbled across a can of worms relating to np.ma and subdtypes.

As for why it worked for float16 - None was being coerced into nan (a questionable feature), rather than erroring.

edit: PR is merged, this will be fixed in numpy 1.14