loading a dataset in python (numpy) when there are

2019-07-08 20:51发布

I have a big dataset contains numeric data and in some of its rows there are variable spaces delimiting columns, like:

4 5 6
7  8    9
2 3 4

When I use this line:

dataset=numpy.loadtxt("dataset.txt", delimiter=" ")

I get this error:

ValueError: Wrong number of columns at line 2

How can I change the code to ignore multiple spaces as well?

标签： python numpy dataset whitespace delimiter

2条回答

做自己的国王

2楼-- · 2019-07-08 21:35

The default for delimiter is 'any whitespace'. If you leave loadtxt out, it copes with multiple spaces.

>>> from io import StringIO
>>> dataset = StringIO('''\
... 4 5 6
... 7 8     9
... 2 3 4''')
>>> import numpy
>>> dataset_as_numpy = numpy.loadtxt(dataset)
>>> dataset_as_numpy
array([[ 4.,  5.,  6.],
       [ 7.,  8.,  9.],
       [ 2.,  3.,  4.]])

0人赞添加讨论(0) 举报

啃猪蹄的小仙女

3楼-- · 2019-07-08 21:39

Use the numpy.genfromtxt function:

>>> import numpy as np
>>> dataset = np.genfromtxt(dataset.txt)
>>> print dataset
array([[ 4., 5., 6.], [ 7., 8., 19.], [ 2., 3., 4.], [ 1., 3., 204.]])

This is from the numpy documentation:

By default, genfromtxt assumes delimiter=None, meaning that the line is split along white spaces (including tabs) and that consecutive white spaces are considered as a single white space.

Hope this helps!

0人赞添加讨论(0) 举报

loading a dataset in python (numpy) when there are

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间