Python's and Numpy's nan and set

2019-01-15 19:01发布

问题:

I ran into an unpredicted behavior with Python's Numpy, set and NaN (not-a-number):

>>> set([np.float64('nan'), np.float64('nan')])
set([nan, nan])
>>> set([np.float32('nan'), np.float32('nan')])
set([nan, nan])
>>> set([np.float('nan'), np.float('nan')])
set([nan, nan])
>>> set([np.nan, np.nan])
set([nan])
>>> set([float('nan'), float('nan')])
set([nan, nan])

Here np.nan yields a single element set, while Numpy's nans yield multiple nans in a set. So does float('nan')! And note that:

>>> type(float('nan')) == type(np.nan)
True

I wonder how this difference come about and what the rationality is behind the different behaviors.

回答1:

One of the properties of NAN is that NAN != NAN, unlike all other numbers. However, the implementation of set first checks to see if id(x) matches the existing member at a hash index before it tries to insert a new one. If you have two objects with different ids that both have the value NAN, you'll get two entries in the set. If they both have the same id, they collapse into a single entry.

As pointed out by others, np.nan is a single object that will always have the same id.