Why does “numpy.mean” return 'inf'?

2020-06-12 04:32发布

问题:

I need to calculate the mean in columns of an array with more than 1000 rows.

np.mean(some_array) gives me inf as output

but i am pretty sure the values are ok. I am loading a csv from here into my Data variable and column 'cement' is "healthy" from my point of view.

In[254]:np.mean(Data[:230]['Cement'])
Out[254]:275.75

but if I increase the number of rows the problem starts:

In [259]:np.mean(Data[:237]['Cement'])
Out[259]:inf

but when i look at the Data

In [261]:Data[230:237]['Cement']
Out[261]:
 array([[ 425. ],
        [ 333.  ],
        [ 250.25],
        [ 491.  ],
        [ 160.  ],
        [ 229.75],
        [ 338.  ]], dtype=float16)

i do not find a reason for this behaviour P.S This happens in Python 3.x using wakari (cloud based Ipython)

Numpy Version '1.8.1'

I am loading the Data with:

No_Col=9
conv = lambda valstr: float(valstr.replace(',','.'))

c={}
for i in range(0,No_Col,1):
    c[i] = conv

Data=np.genfromtxt(get_data,dtype=float16 , delimiter='\t', skip_header=0, names=True,   converters=c)

回答1:

I will guess that the problem is precision (as others have also commented). Quoting directly from the documentation for mean() we see

Notes

The arithmetic mean is the sum of the elements along the axis divided by the number of elements.

Note that for floating-point input, the mean is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-precision accumulator using the dtype keyword can alleviate this issue.

Since your array is of type float16 you have very limited precision. Using dtype=np.float64 will probably alleviate the overflow. Also see the examples in the mean() documentation.



标签: python numpy