Accuracy Score : ValueError: Can't Handle mix

2019-01-22 06:33发布

问题:

I'm using linear_model.LinearRegression from scikit-learn as a predictive model. It works and it's perfect. I have a problem to evaluate the predicted results using the accuracy_score metric. This is my true Data :

array([1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0])

And this is my predictive Data :

array([ 0.07094605,  0.1994941 ,  0.19270157,  0.13379635,  0.04654469,
    0.09212494,  0.19952108,  0.12884365,  0.15685076, -0.01274453,
    0.32167554,  0.32167554, -0.10023553,  0.09819648, -0.06755516,
    0.25390082,  0.17248324])

My code :

accuracy_score(y_true, y_pred, normalize=False)

And this is the error message :

"ValueError: Can't handle mix of binary and continuous"

Help ? Thank you.

回答1:

accuracy_score(y_true, y_pred.round(), normalize=False)

if you prefer to have more control on the threshold use (y_pred>threshold).astype(int) instead of y_pred.round() where threshold is your value to separate the two classes.



回答2:

The sklearn.metrics.accuracy_score(y_true, y_pred) method defines y_pred as:

y_pred : 1d array-like, or label indicator array / sparse matrix. Predicted labels, as returned by a classifier.

Which means y_pred has to be an array of 1's or 0's (predicated labels). They should not be probabilities.

The predicated labels (1's and 0's) and/or predicted probabilites can be generated using the LinearRegression() model's methods predict() and predict_proba() respectively.

1. Generate predicted labels:

LR = linear_model.LinearRegression()
y_preds=LR.predict(X_test)
print(y_preds)

output:

[1 1 0 1]

'y_preds' can now be used for the accuracy_score() method: accuracy_score(y_true, y_pred)

2. Generate probabilities for labels:

Some metrics such as 'precision_recall_curve(y_true, probas_pred)' require probabilities, which can be generated as follows:

LR = linear_model.LinearRegression()
y_preds=LR.predict_proba(X_test)
print(y_preds)

output:

[0.87812372 0.77490434 0.30319547 0.84999743]



回答3:

The problem is that the true y is binary (zeros and ones), while your predictions are not. You probably generated probabilities and not predictions, hence the result :) Try instead to generate class membership, and it should work!



回答4:

Maybe this helps someone who finds this question:

As JohnnyQ already pointed out, the problem is that you have non-binary (not 0 nor 1) values in your y_pred, i. e. when adding

print(((y_pred != 0.) & (y_pred != 1.)).any())

you will see True in the output. (The command finds out if there is any value that is not 0 or 1).

You can see your non-binary values using:

non_binary_values = y_pred[(y_pred['score'] != 1) & (y_pred['score'] != 0)]
non_binary_idxs = y_pred[(y_pred['score'] != 1) & (y_pred['score'] != 0)].index

A print statement can output the above derivated variables.

Finally, this function can clean your data of all non-binary entries:

def remove_unlabelled_data(X, y):
    drop_indexes = X[(y['score'] != 1) & (y['score'] != 0)].index
    return X.drop(drop_indexes), y.drop(drop_indexes)


回答5:

accuracy_score is a classification metric, you cannot use it for a regression problem.

You can see the available regression metrics here