I have a big dataset contains numeric data and in some of its rows there are variable spaces delimiting columns, like:
4 5 6
7 8 9
2 3 4
When I use this line:
dataset=numpy.loadtxt("dataset.txt", delimiter=" ")
I get this error:
ValueError: Wrong number of columns at line 2
How can I change the code to ignore multiple spaces as well?
The default for delimiter
is 'any whitespace'. If you leave loadtxt
out, it copes with multiple spaces.
>>> from io import StringIO
>>> dataset = StringIO('''\
... 4 5 6
... 7 8 9
... 2 3 4''')
>>> import numpy
>>> dataset_as_numpy = numpy.loadtxt(dataset)
>>> dataset_as_numpy
array([[ 4., 5., 6.],
[ 7., 8., 9.],
[ 2., 3., 4.]])
Use the numpy.genfromtxt
function:
>>> import numpy as np
>>> dataset = np.genfromtxt(dataset.txt)
>>> print dataset
array([[ 4., 5., 6.],
[ 7., 8., 19.],
[ 2., 3., 4.],
[ 1., 3., 204.]])
This is from the numpy documentation:
By default, genfromtxt assumes delimiter=None, meaning that the line is split along white spaces (including tabs) and that consecutive white spaces are considered as a single white space.
Hope this helps!