I am a new with python and I need some help with my code. I am reading an . arff file with my jupyter notebook using pyhton2.7.I would like to know which argument I need to put in arff.lodarff ,or another way to do it, so I can ignore the header of my data.
rain,meta = arff.loadarff(open('train.arff', 'r'))
After I read the file I am doing some mathematical operations and I got this error.
I hope someone can help me to figure out.
train,meta = arff.loadarff(open('train.arff', 'r'))
train = pd.DataFrame(train)
print(train)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-192-3b2868d1fd43> in <module>()
----> 1 ne = getNeighbors(X_train, y_train, X_test, k = 3)
2 print(ne)
<ipython-input-191-75b4da86d04e> in getNeighbors(X_train, y_train, X_test, k)
6 for (trainpoint,y_train_label) in zip(X_train,y_train):
7 # calculate the distance and append it to a distances_label with the associated label.
----> 8 distances_label.append((distance(testpoint, trainpoint), y_train_label))
9 k_neighbors_with_labels += [sorted(distances_label)[0:k]] # sort the distances and taken the first k neighbors
10 return k_neighbors_with_labels
<ipython-input-186-22e861402349> in distance(testpoint, trainpoint)
2 def distance(testpoint, trainpoint):
3 # distance between testpoint and trainpoint.
----> 4 dist = np.sqrt(np.sum(np.power(float(testpoint)-float(trainpoint), 2)))
5 return dis
6
ValueError: could not convert string to float: sepal_length
You assume that
testpoint
is an array in your distance function.But what if it isn't?
You are using pandas dataframes, these are not just arrays, and that is why you get column names.