inequality comparison of numpy array with nan to a

2019-02-12 04:44发布

问题:

I am trying to set members of an array that are below a threshold to nan. This is part of a QA/QC process and the incoming data may already have slots that are nan.

So as an example my threshold might be -1000 and hence I would want to set -3000 to nan in the following array

x = np.array([np.nan,1.,2.,-3000.,np.nan,5.])

This following:

x[x < -1000.] = np.nan

produces the correct behavior, but also a RuntimeWarning, but the overhead of disabling the warning

warnings.filterwarnings("ignore")
...
warnints.resetwarnings()

is kind of heavy an potentially a bit unsafe.

Trying to index twice with fancy indexing as follows doesn't produce any effect:

nonan = np.where(~np.isnan(x))[0]
x[nonan][x[nonan] < -1000.] = np.nan

I assume this is because a copy is made due to the integer index or the use of indexing twice.

Does anyone have a relatively simple solution? It would be fine to use a masked array in the process, but the final product has to be an ndarray and I can't introduce new dependencies. Thanks.

回答1:

Any comparison (other than !=) of a NaN to a non-NaN value will always return False:

>>> x < -1000
array([False, False, False,  True, False, False], dtype=bool)

So you can simply ignore the fact that there are NaNs already in your array and do:

>>> x[x < -1000] = np.nan
>>> x
array([ nan,   1.,   2.,  nan,  nan,   5.])

EDIT I don't see any warning when I ran the above, but if you really need to stay away from the NaNs, you can do something like:

mask = ~np.isnan(x)
mask[mask] &= x[mask] < -1000
x[mask] = np.nan


回答2:

One option is to disable the relevant warnings with numpy.errstate:

with numpy.errstate(invalid='ignore'):
    ...

To turn off the relevant warnings globally, use numpy.seterr.



回答3:

I personally ignore the warnings using the np.errstate context manager in the answer already given, as the code clarity is worth the extra time, but here is an alternative.

# given
x = np.array([np.nan, 1., 2., -3000., np.nan, 5.])

# apply NaNs as desired
mask = np.zeros(x.shape, dtype=bool)
np.less(x, -1000, out=mask, where=~np.isnan(x))
x[mask] = np.nan

# expected output and comparison
y = np.array([np.nan, 1., 2., np.nan, np.nan, 5.])
assert np.allclose(x, y, rtol=0., atol=1e-14, equal_nan=True)

The numpy less ufunc takes the optional argument where, and only evaluates it where true, unlike the np.where function which evaluates both options and then picks the relevant one. You then set the desired output when it's not true by using the out argument.



回答4:

A little bit late, but this is how I would do:

x = np.array([np.nan,1.,2.,-3000.,np.nan,5.]) 

igood=np.where(~np.isnan(x))[0]
x[igood[x[igood]<-1000.]]=np.nan


回答5:

np.less() has a where argument that controls where the operation will be applied. So you could do:

x[np.less(x, -1000., where=~np.isnan(x))] = np.nan