I have a big dataset contains numeric data and in some of its rows there are variable spaces delimiting columns, like:
4 5 6
7 8 9
2 3 4
When I use this line:
dataset=numpy.loadtxt("dataset.txt", delimiter=" ")
I get this error:
ValueError: Wrong number of columns at line 2
How can I change the code to ignore multiple spaces as well?
The default for
delimiter
is 'any whitespace'. If you leaveloadtxt
out, it copes with multiple spaces.Use the
numpy.genfromtxt
function:>>> import numpy as np
>>> dataset = np.genfromtxt(dataset.txt)
>>> print dataset
array([[ 4., 5., 6.], [ 7., 8., 19.], [ 2., 3., 4.], [ 1., 3., 204.]])
This is from the numpy documentation:
By default, genfromtxt assumes delimiter=None, meaning that the line is split along white spaces (including tabs) and that consecutive white spaces are considered as a single white space.
Hope this helps!