I can't figure out if I've setup my binary classification problem correctly. I labeled the positive class 1 and the negative 0. However It is my understanding that by default scikit-learn uses class 0 as the positive class in its confusion matrix (so the inverse of how I set it up). This is confusing to me. Is the top row, in scikit-learn's default setting, the positive or negative class? Lets assume the confusion matrix output:
confusion_matrix(y_test, preds)
[ [30 5]
[2 42] ]
How would it look like in a confusion matrix? Are the actual instances the rows or the columns in scikit-learn?
prediction prediction
0 1 1 0
----- ----- ----- -----
0 | TN | FP (OR) 1 | TP | FP
actual ----- ----- actual ----- -----
1 | FN | TP 0 | FN | TN
Short answer In binary classification, when using the argument
labels
,the class labels,
0
, and1
, are considered asNegative
andPositive
, respectively. This is due to the order implied by the list, and not the alpha-numerical order.Verification: Consider an imbalance class labels like this: (using imbalance class to make the distinction easier)
this would give you a confusion table as follows:
which corresponds to:
where
FN=2
means that there were 2 cases where the model predicted the sample to be negative (i.e.,0
) but the actual label was positive (i.e.,1
), hence False Negative equals 2.Similarly for
TN=12
, in 12 cases the model correctly predicted the negative class (0
), hence True Negative equals 12.This way everything adds up assuming that
sklearn
considers the first label (inlabels=[0,1]
as the negative class. Therefore, here,0
, the first label, represents the negative class.scikit learn sorts labels in ascending order, thus 0's are first column/row and 1's are the second one
This is written in the docs:
Thus you can alter this behavior by providing labels to confusion_matrix call
And actual/predicted are oredered just like in your images - predictions are in columns and actual values in rows