Here is my code:
import pandas as pa
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score
def get_accuracy(X_train, y_train, y_test):
perceptron = Perceptron(random_state=241)
perceptron.fit(X_train, y_train)
result = accuracy_score(y_train, y_test)
return result
test_data = pa.read_csv("C:/Users/Roman/Downloads/perceptron-test.csv")
test_data.columns = ["class", "f1", "f2"]
train_data = pa.read_csv("C:/Users/Roman/Downloads/perceptron-train.csv")
train_data.columns = ["class", "f1", "f2"]
accuracy = get_accuracy(train_data[train_data.columns[1:]], train_data[train_data.columns[0]], test_data[test_data.columns[0]])
print(accuracy)
I don't understand why I get this error:
Traceback (most recent call last):
File "C:/Users/Roman/PycharmProjects/data_project-1/lecture_2_perceptron.py", line 35, in <module>
accuracy = get_accuracy(train_data[train_data.columns[1:]],
train_data[train_data.columns[0]], test_data[test_data.columns[0]])
File "C:/Users/Roman/PycharmProjects/data_project-1/lecture_2_perceptron.py", line 22, in get_accuracy
result = accuracy_score(y_train, y_test)
File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\metrics\classification.py", line 172, in accuracy_score
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\metrics\classification.py", line 72, in _check_targets
check_consistent_length(y_true, y_pred)
File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\utils\validation.py", line 176, in check_consistent_length
"%s" % str(uniques))
ValueError: Found arrays with inconsistent numbers of samples: [199 299]
I want to get accuracy by method accuracy_score by get this type of error. I Googled by I cannot find anything that can help me. Who can explain me what happens?
sklearn.metrics.accuracy_score()
takesy_true
andy_pred
arguments. That is, for the same data set (presumably the test set), it wants to know the ground truth and the values predicted by your model. This will allow it to evaluate how well your model has performed compared to a hypothetical perfect model.In your code, you are passing the true outcome variables for two different data sets. These outcomes are both truth and in no way reflect your model's ability to correctly classify observations!
Updating your
get_accuracy()
function to also takeX_test
as a parameter, I think this is more in line with what you intended to do: