-->

Unexpected nan behaviour when summing a numpy arra

2019-09-22 01:01发布

问题:

This is an interesting topic given it could lead to unexpected results in code. Suppose I had an array as follows;

import numpy as np

X = np.array([np.nan,np.nan,np.nan,np.nan,np.nan])

np.nanmean(X) rightly returns a warning that the averaging slice is empty and returns nan. However, when doing a summation of the array, np.nansum(X), it returns 0.0. Now while mathematically true (the sum of nothing is 0), the result expected to be returned might be np.nan.

For an example, I have a function where if a file of ice data doesn't exist, it will create an empty array of nans (180x360 points with each point representing a lat/lon degree). This array is then passed to a function which sums over the array to find out the total amount of ice in the array. If the expected value is 9-10 million km2, and nansum is returning 0, this can be misleading. This is especially difficult if ice extents are around 0 anyway. In the plot below this is clearly a missing data file leading to a ice sum of 0.0, but not all cases are so clear.

I've seen this discussed on development websites, and want to know why there isn't an kwarg option for np.nansum() to return np.nan if required, and B, is there a function which returns True/False if the entire matrix is full of nan?

回答1:

Docs:

In NumPy versions <= 1.8.0 Nan is returned for slices that are all-NaN or empty. In later versions zero is returned.

Workaround:

def nansumwrapper(a, **kwargs):
    if np.isnan(a).all():
        return np.nan
    else:
        return np.nansum(a, **kwargs)

a = np.array([np.nan, np.nan])
b = np.array([np.nan, 1., 2.])


nansumwrapper(a)
# nan

nansumwrapper(b)
# 3.0

You can pass kwargs to np.nansum():

c = np.arange(12, dtype=np.float_).reshape(4,3)
c[2:4, 1] = np.nan

nansumwrapper(c, axis=1)
# array([  3.,  12.,  14.,  20.])