Reading an .arff file and trying to ignore the hea

2019-03-06 18:21发布

问题:

I am a new with python and I need some help with my code. I am reading an . arff file with my jupyter notebook using pyhton2.7.I would like to know which argument I need to put in arff.lodarff ,or another way to do it, so I can ignore the header of my data.

rain,meta = arff.loadarff(open('train.arff', 'r'))

After I read the file I am doing some mathematical operations and I got this error.

I hope someone can help me to figure out.

train,meta = arff.loadarff(open('train.arff', 'r'))
train = pd.DataFrame(train)
print(train)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-192-3b2868d1fd43> in <module>()
----> 1 ne = getNeighbors(X_train, y_train, X_test, k = 3)
      2 print(ne)

<ipython-input-191-75b4da86d04e> in getNeighbors(X_train, y_train, X_test, k)
      6             for (trainpoint,y_train_label) in zip(X_train,y_train):
      7                 # calculate the distance and append it to a distances_label with the associated label.
----> 8                 distances_label.append((distance(testpoint, trainpoint), y_train_label))
      9             k_neighbors_with_labels += [sorted(distances_label)[0:k]] # sort the distances and taken the first k neighbors
     10         return k_neighbors_with_labels

<ipython-input-186-22e861402349> in distance(testpoint, trainpoint)
      2 def distance(testpoint, trainpoint):
      3     # distance between testpoint and trainpoint.
----> 4     dist = np.sqrt(np.sum(np.power(float(testpoint)-float(trainpoint), 2)))
      5     return dis
      6 

ValueError: could not convert string to float: sepal_length

回答1:

You assume that testpoint is an array in your distance function.

But what if it isn't?

You are using pandas dataframes, these are not just arrays, and that is why you get column names.