Scikit-learn confusion matrix

2019-03-26 13:38发布

I can't figure out if I've setup my binary classification problem correctly. I labeled the positive class 1 and the negative 0. However It is my understanding that by default scikit-learn uses class 0 as the positive class in its confusion matrix (so the inverse of how I set it up). This is confusing to me. Is the top row, in scikit-learn's default setting, the positive or negative class? Lets assume the confusion matrix output:

confusion_matrix(y_test, preds)
 [ [30  5]
    [2 42] ]

How would it look like in a confusion matrix? Are the actual instances the rows or the columns in scikit-learn?

          prediction                        prediction
           0       1                          1       0
         -----   -----                      -----   -----
      0 | TN   |  FP        (OR)         1 |  TP  |  FP
actual   -----   -----             actual   -----   -----
      1 | FN   |  TP                     0 |  FN  |  TN

2条回答
霸刀☆藐视天下
2楼-- · 2019-03-26 14:15

Short answer In binary classification, when using the argument labels ,

confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0], labels=[0,1]).ravel()

the class labels, 0, and 1, are considered as Negative and Positive, respectively. This is due to the order implied by the list, and not the alpha-numerical order.


Verification: Consider an imbalance class labels like this: (using imbalance class to make the distinction easier)

>>> y_true = [0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0]
>>> y_pred = [0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0]
>>> table = confusion_matrix(y_true, y_pred, labeels=[0,1]).reval()

this would give you a confusion table as follows:

>>> table
array([12,  1,  2,  1])

which corresponds to:

              Actual
        |   1   |   0  |
     ___________________
pred  1 |  TP=1 | FP=1 |
      0 |  FN=2 | TN=12|

where FN=2 means that there were 2 cases where the model predicted the sample to be negative (i.e., 0) but the actual label was positive (i.e., 1), hence False Negative equals 2.

Similarly for TN=12, in 12 cases the model correctly predicted the negative class (0), hence True Negative equals 12.

This way everything adds up assuming that sklearn considers the first label (in labels=[0,1] as the negative class. Therefore, here, 0, the first label, represents the negative class.

查看更多
Juvenile、少年°
3楼-- · 2019-03-26 14:20

scikit learn sorts labels in ascending order, thus 0's are first column/row and 1's are the second one

>>> from sklearn.metrics import confusion_matrix as cm
>>> y_test = [1, 0, 0]
>>> y_pred = [1, 0, 0]
>>> cm(y_test, y_pred)
array([[2, 0],
       [0, 1]])
>>> y_pred = [4, 0, 0]
>>> y_test = [4, 0, 0]
>>> cm(y_test, y_pred)
array([[2, 0],
       [0, 1]])
>>> y_test = [-2, 0, 0]
>>> y_pred = [-2, 0, 0]
>>> cm(y_test, y_pred)
array([[1, 0],
       [0, 2]])
>>> 

This is written in the docs:

labels : array, shape = [n_classes], optional List of labels to index the matrix. This may be used to reorder or select a subset of labels. If none is given, those that appear at least once in y_true or y_pred are used in sorted order.

Thus you can alter this behavior by providing labels to confusion_matrix call

>>> y_test = [1, 0, 0]
>>> y_pred = [1, 0, 0]
>>> cm(y_pred, y_pred)
array([[2, 0],
       [0, 1]])
>>> cm(y_pred, y_pred, labels=[1, 0])
array([[1, 0],
       [0, 2]])

And actual/predicted are oredered just like in your images - predictions are in columns and actual values in rows

>>> y_test = [5, 5, 5, 0, 0, 0]
>>> y_pred = [5, 0, 0, 0, 0, 0]
>>> cm(y_test, y_pred)
array([[3, 0],
       [2, 1]])
  • true: 0, predicted: 0 (value: 3, position [0, 0])
  • true: 5, predicted: 0 (value: 2, position [1, 0])
  • true: 0, predicted: 5 (value: 0, position [0, 1])
  • true: 5, predicted: 5 (value: 1, position [1, 1])
查看更多
登录 后发表回答