Unexpected nan behaviour when summing a numpy arra

2019-09-22 00:53发布

This is an interesting topic given it could lead to unexpected results in code. Suppose I had an array as follows;

import numpy as np

X = np.array([np.nan,np.nan,np.nan,np.nan,np.nan])

np.nanmean(X) rightly returns a warning that the averaging slice is empty and returns nan. However, when doing a summation of the array, np.nansum(X), it returns 0.0. Now while mathematically true (the sum of nothing is 0), the result expected to be returned might be np.nan.

For an example, I have a function where if a file of ice data doesn't exist, it will create an empty array of nans (180x360 points with each point representing a lat/lon degree). This array is then passed to a function which sums over the array to find out the total amount of ice in the array. If the expected value is 9-10 million km2, and nansum is returning 0, this can be misleading. This is especially difficult if ice extents are around 0 anyway. In the plot below this is clearly a missing data file leading to a ice sum of 0.0, but not all cases are so clear.

enter image description here

I've seen this discussed on development websites, and want to know why there isn't an kwarg option for np.nansum() to return np.nan if required, and B, is there a function which returns True/False if the entire matrix is full of nan?

1条回答
我命由我不由天
2楼-- · 2019-09-22 01:30

Docs:

In NumPy versions <= 1.8.0 Nan is returned for slices that are all-NaN or empty. In later versions zero is returned.

Workaround:

def nansumwrapper(a, **kwargs):
    if np.isnan(a).all():
        return np.nan
    else:
        return np.nansum(a, **kwargs)

a = np.array([np.nan, np.nan])
b = np.array([np.nan, 1., 2.])


nansumwrapper(a)
# nan

nansumwrapper(b)
# 3.0

You can pass kwargs to np.nansum():

c = np.arange(12, dtype=np.float_).reshape(4,3)
c[2:4, 1] = np.nan

nansumwrapper(c, axis=1)
# array([  3.,  12.,  14.,  20.])
查看更多
登录 后发表回答