I am attempting to process data saved to CSV that may have missing values in an unknown number of columns (up to around 30). I am attempting to set those missing values to '0' using genfromtxt
's filling_missing
argument. Here is a minimal working example for numpy 1.6.2 running in ActiveState ActivePython 2.7 32 bit on Win 7.
import numpy
text = "a,b,c,d\n1,2,3,4\n5,,7,8"
a = numpy.genfromtxt('test.txt',delimiter=',',names=True)
b = open('test.txt','w')
b.write(text)
b.close()
a = numpy.genfromtxt('test.txt',delimiter=',',names=True)
print "plain",a
a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values=0)
print "filling_values=0",a
a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values={1:0})
print "filling_values={1:0}",a
a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values={0:0})
print "filling_values={0:0}",a
a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values={None:0})
print "filling_values={None:0}",a
And the result:
plain [(1.0, 2.0, 3.0, 4.0) (5.0, nan, 7.0, 8.0)]
filling_values=0 [(1.0, 2.0, 3.0, 4.0) (5.0, nan, 7.0, 8.0)]
filling_values={1:0} [(1.0, 2.0, 3.0, 4.0) (5.0, 0.0, 7.0, 8.0)]
filling_values={0:0} [(1.0, 2.0, 3.0, 4.0) (5.0, nan, 7.0, 8.0)]
Traceback (most recent call last):
File "C:\Users\tolivo.EE\Documents\active\eng\python\sizer\testGenfromtxt.py", line 20, in <module>
a = numpy.genfromtxt('test.txt',delimiter=',',names=True,filling_values={None:0})
File "C:\Users\tolivo.EE\AppData\Roaming\Python\Python27\site-packages\numpy\lib\npyio.py", line 1451, in genfromtxt
filling_values[key] = val
TypeError: list indices must be integers, not NoneType
From the NumPy user guide I would expect filling_values=0
and filling_values={None:0}
to work but instead they don't, and throw an error respectively. When you specify the correct column (filling_values={1:0}
) it will work, but since I have a large amount of columns of unknown number before selection by the user, I am looking for the way to set the filled values automatically like the user guide hints at.
I imagine I can probably count the columns in advance and create a dict to pass as the value to filling_values in the meantime, but is there a better way?