Python (pyspark) Error = ValueError: could not con

I am working with Python on Spark and reading my dataset from a .csv file whose first a few rows are:

When I read from the file line by line with the code below:

# Load and parse the data
def parsePoint(line):
   values = [float(x) for x in line.replace(',', ' ').split(' ')]
   return LabeledPoint(values[0], values[1:])

I get the this error:

Traceback (most recent call last):
  File "<stdin>", line 3, in parsePoint
ValueError: could not convert string to float: "17"

Any help is greatly appreciated.

Following the comments below this answer, you should use:

[float(x.strip(' "')) for x in line.split(',')]

You do not need to replace ',' with ' ', you should simply split on , and then remove leading and trailing whitespaces and quotes (x.strip(' "')) before converting to float.

Also, have a look at the csv packages which may simplify your work.

Below is the answer to the original question (before comments).

You need to use .split() instead of .split(' '). You have multiple consecutive space characters in your line, so splitting on ' ' results in empty strings, e.g. your first line is split into:

['17', '', '0.2', '', '7']

The problem are those empty strings that you (obviously) cannot convert to float.

Using split() will solve the problem thanks to the behaviour of split when its sep argument is None (or not present):

If the optional second argument sep is absent or None, the words are separated by arbitrary strings of whitespace characters (space, tab, newline, return, formfeed).

See the doc of split, and a small example to understand the difference:

>>> sp5 = ' ' * 5
>>> sp5.split()
[]
>>> sp5.split(' ')
['', '', '', '', '', '']

Python (pyspark) Error = ValueError: could not con

问题:

回答1:

收藏的人(0)

Python (pyspark) Error = ValueError: could not con

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮