I'm using linear_model.LinearRegression from scikit-learn as a predictive model. It works and it's perfect. I have a problem to evaluate the predicted results using the accuracy_score metric.
This is my true Data :
array([1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0])
And this is my predictive Data :
array([ 0.07094605, 0.1994941 , 0.19270157, 0.13379635, 0.04654469,
0.09212494, 0.19952108, 0.12884365, 0.15685076, -0.01274453,
0.32167554, 0.32167554, -0.10023553, 0.09819648, -0.06755516,
0.25390082, 0.17248324])
My code :
accuracy_score(y_true, y_pred, normalize=False)
And this is the error message :
"ValueError: Can't handle mix of binary and continuous"
Help ? Thank you.
accuracy_score(y_true, y_pred.round(), normalize=False)
if you prefer to have more control on the threshold use
(y_pred>threshold).astype(int)
instead of y_pred.round()
where threshold
is your value to separate the two classes.
The sklearn.metrics.accuracy_score(y_true, y_pred) method defines y_pred as:
y_pred : 1d array-like, or label indicator array / sparse matrix. Predicted labels, as returned by a classifier.
Which means y_pred has to be an array of 1's or 0's (predicated labels). They should not be probabilities.
The predicated labels (1's and 0's) and/or predicted probabilites can be generated using the LinearRegression() model's methods predict() and predict_proba() respectively.
1. Generate predicted labels:
LR = linear_model.LinearRegression()
y_preds=LR.predict(X_test)
print(y_preds)
output:
[1 1 0 1]
'y_preds' can now be used for the accuracy_score() method: accuracy_score(y_true, y_pred)
2. Generate probabilities for labels:
Some metrics such as 'precision_recall_curve(y_true, probas_pred)' require probabilities, which can be generated as follows:
LR = linear_model.LinearRegression()
y_preds=LR.predict_proba(X_test)
print(y_preds)
output:
[0.87812372 0.77490434 0.30319547 0.84999743]
The problem is that the true y is binary (zeros and ones), while your predictions are not. You probably generated probabilities and not predictions, hence the result :)
Try instead to generate class membership, and it should work!
Maybe this helps someone who finds this question:
As JohnnyQ already pointed out, the problem is that you have non-binary (not 0 nor 1) values in your y_pred
, i. e. when adding
print(((y_pred != 0.) & (y_pred != 1.)).any())
you will see True
in the output. (The command finds out if there is any value that is not 0 or 1).
You can see your non-binary values using:
non_binary_values = y_pred[(y_pred['score'] != 1) & (y_pred['score'] != 0)]
non_binary_idxs = y_pred[(y_pred['score'] != 1) & (y_pred['score'] != 0)].index
A print statement can output the above derivated variables.
Finally, this function can clean your data of all non-binary entries:
def remove_unlabelled_data(X, y):
drop_indexes = X[(y['score'] != 1) & (y['score'] != 0)].index
return X.drop(drop_indexes), y.drop(drop_indexes)
accuracy_score is a classification metric, you cannot use it for a regression problem.
You can see the available regression metrics here