Consider the array
x = np.array(['1', '2', 'a'])
Tying to convert to a float array raises an exception
x.astype(np.float)
ValueError: could not convert string to float: a
Does numpy provide any efficient way to coerce this into a numeric array, replacing non-numeric values with something like NAN?
Alternatively, is there an efficient numpy function equivalent to np.isnan
, but which also tests for non-numeric elements like letters?
You can convert an array of strings into an array of floats (with NaNs) using np.genfromtxt
:
In [83]: np.set_printoptions(precision=3, suppress=True)
In [84]: np.genfromtxt(np.array(['1','2','3.14','1e-3','b','nan','inf','-inf']))
Out[84]: array([ 1. , 2. , 3.14 , 0.001, nan, nan, inf, -inf])
Here is a way to identify "numeric" strings:
In [34]: x
Out[34]:
array(['1', '2', 'a'],
dtype='|S1')
In [35]: x.astype('unicode')
Out[35]:
array([u'1', u'2', u'a'],
dtype='<U1')
In [36]: np.char.isnumeric(x.astype('unicode'))
Out[36]: array([ True, True, False], dtype=bool)
Note that "numeric" means a unicode that contains only digit characters -- that is, characters that have the Unicode numeric value property. It does not include the decimal point. So u'1.3'
is not considered "numeric".
If you happen to be using pandas as well you could use the pd.to_numeric()
method:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: x = np.array(['1', '2', 'a'])
In [4]: pd.to_numeric(x, errors='coerce')
Out[4]: array([ 1., 2., nan])