Accuracy of multivariate classification and regres

2019-08-18 00:28发布

问题:

I wrote one simple linear regression model and one decision tree model, they work good. My question is, how to calculate the accuracy of these two models. I mean, whats the difference between calculating the accuracy of classification and regression models? Do I need to split data into train and test?

Till now , i was using .score(x_test, y_test) but I read that that is not accuracy of model. I have tried to use metrics but I always get this error:

ValueError: Found input variables with inconsistent numbers of samples: [2, 1]

Please check out my code , I have tried to make it work, but I failed.

This is the code:

import pandas as pd
from sklearn import linear_model
from sklearn import tree
from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_squared_error


dic = {'par_1': [10, 30, 13, 19, 25, 33, 23],
       'par_2': [1, 3, 1, 2, 3, 3, 2],
       'outcome': [101, 905, 182, 268, 646, 624, 465]}

df = pd.DataFrame(dic)

variables = df.iloc[:,:-1]
results = df.iloc[:,-1]

var_train, var_test, res_train, res_test = train_test_split(variables, results, test_size = 0.2, random_state = 4)

regression = linear_model.LinearRegression()
regression.fit(var_train, res_train)

input_values = [14, 2]

prediction = regression.predict([input_values])
print(prediction)

accuracy_regression = mean_squared_error(var_test, prediction)
print(accuracy_regression)


dic = {'par_1': [10, 30, 13, 19, 25, 33, 23],
       'par_2': [1, 3, 1, 2, 3, 3, 2],
       'outcome': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'yes']}

df = pd.DataFrame(dic)

variables = df.iloc[:,:-1]
results = df.iloc[:,-1]

var_train, var_test, res_train, res_test = train_test_split(variables, results, test_size = 0.2, random_state = 4)

decision_tree = tree.DecisionTreeClassifier()
decision_tree.fit(var_train, res_train)

input_values = [18, 2]

prediction = decision_tree.predict([input_values])[0]
print(prediction)

accuracy_classification = accuracy_score(res_test, prediction)
print(accuracy_classification)

回答1:

Accuracy is a metric used for classification but not for regression. In the case of regression, you can use R squared, negative mean squared error, etc. Accuracy is defined as the number of data points classified correctly to the total number of data points and it not used in the case of continuous variables.

You can use the following metric for measuring the predictability of a regression model. https://scikit-learn.org/stable/modules/classes.html#regression-metrics For example, you can compute R squared using

metrics.r2_score(y_true, y_pred[, …])

Also, the following ones can be implemented for a classification model. https://scikit-learn.org/stable/modules/classes.html#classification-metrics Accuracy can be computed using

metrics.accuracy_score(y_true, y_pred[, …])

In your case, you can compute R squared for the regression model using:

y_pred_test = regression.predict(x_test)
metrics.score(y_true, y_pred_test)

And also the following gives you the accuracy of your decision tree.

y_pred_test = decision_tree.predict(x_test)
metrics.accuracy_score(y_true, y_pred_test)