I wrote one simple linear regression model and one decision tree model, they work good. My question is, how to calculate the accuracy of these two models. I mean, whats the difference between calculating the accuracy of classification and regression models? Do I need to split data into train and test?
Till now , i was using .score(x_test, y_test)
but I read that that is not accuracy of model. I have tried to use metrics but I always get this error:
ValueError: Found input variables with inconsistent numbers of samples: [2, 1]
Please check out my code , I have tried to make it work, but I failed.
This is the code:
import pandas as pd
from sklearn import linear_model
from sklearn import tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_squared_error
dic = {'par_1': [10, 30, 13, 19, 25, 33, 23],
'par_2': [1, 3, 1, 2, 3, 3, 2],
'outcome': [101, 905, 182, 268, 646, 624, 465]}
df = pd.DataFrame(dic)
variables = df.iloc[:,:-1]
results = df.iloc[:,-1]
var_train, var_test, res_train, res_test = train_test_split(variables, results, test_size = 0.2, random_state = 4)
regression = linear_model.LinearRegression()
regression.fit(var_train, res_train)
input_values = [14, 2]
prediction = regression.predict([input_values])
print(prediction)
accuracy_regression = mean_squared_error(var_test, prediction)
print(accuracy_regression)
dic = {'par_1': [10, 30, 13, 19, 25, 33, 23],
'par_2': [1, 3, 1, 2, 3, 3, 2],
'outcome': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'yes']}
df = pd.DataFrame(dic)
variables = df.iloc[:,:-1]
results = df.iloc[:,-1]
var_train, var_test, res_train, res_test = train_test_split(variables, results, test_size = 0.2, random_state = 4)
decision_tree = tree.DecisionTreeClassifier()
decision_tree.fit(var_train, res_train)
input_values = [18, 2]
prediction = decision_tree.predict([input_values])[0]
print(prediction)
accuracy_classification = accuracy_score(res_test, prediction)
print(accuracy_classification)