I am using Sckit-learn and am using a Confusion Matrix to get more insight into how my algorithm is performing:
X_train, X_test, Y_train, Y_test = train_test_split(keywords_list,
label_list, test_size=0.33, random_state=42)
pipeline.fit(X_train, Y_train)
pred = pipeline.predict(X_test)
print(confusion_matrix(Y_test, pred))
I am getting an output like this:
[[1011 72]
[ 154 1380]]
Which I assume follows the format for these Matrixes:
TP|FP
FN|TN
Is it possible to retrieve the values that are being classified as false positives and False Negatives? Knowing what that data looks like would be helpful towards my work. it goes without saying I am new to Sckit-Learn.
EDIT:
Alessandro gave good advice by informing me that Y_test != pred
would return all of my false positives/negatives in the confusion matrix.
One factor that I should have mentioned in my original question is that I am classifying textual data under binary labels. (E.g. Ham/Spam) and I want to identify them seperately from each other. My current code for extracting false negatives is taking the form of:
false_neg = open('false_neg.csv', 'w')
falsen_list = X_test[(Y_test == 'Spam') and (pred == 'Ham')] #False Negatives
wr2 = csv.writer(false_neg, quoting=csv.QUOTE_ALL)
for x in falsen_list:
wr2.writerow([x])
Unfortunately, this throws an error:
Traceback (most recent call last):
File "/home/noname365/PycharmProjects/MLCorpusBlacklist/CorpusML_training.py", line 171, in <module>
falsen_list = X_test[(Y_test == 'blacklisted') and (pred == 'clean')] #False Negatives
File "/home/noname365/virtualenvs/env35/lib/python3.5/site-packages/pandas/core/generic.py", line 731, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Am I on the right track here?