Is it possible to retrieve False Positives/ False

2019-07-13 02:52发布

问题:

I am using Sckit-learn and am using a Confusion Matrix to get more insight into how my algorithm is performing:

X_train, X_test, Y_train, Y_test = train_test_split(keywords_list, 

label_list, test_size=0.33, random_state=42)

pipeline.fit(X_train, Y_train)

pred = pipeline.predict(X_test)

print(confusion_matrix(Y_test, pred))

I am getting an output like this:

[[1011   72]
[ 154 1380]]

Which I assume follows the format for these Matrixes:

TP|FP
FN|TN

Is it possible to retrieve the values that are being classified as false positives and False Negatives? Knowing what that data looks like would be helpful towards my work. it goes without saying I am new to Sckit-Learn.

EDIT:

Alessandro gave good advice by informing me that Y_test != pred would return all of my false positives/negatives in the confusion matrix.

One factor that I should have mentioned in my original question is that I am classifying textual data under binary labels. (E.g. Ham/Spam) and I want to identify them seperately from each other. My current code for extracting false negatives is taking the form of:

false_neg = open('false_neg.csv', 'w')
falsen_list = X_test[(Y_test == 'Spam') and (pred == 'Ham')] #False Negatives
wr2 = csv.writer(false_neg, quoting=csv.QUOTE_ALL)
for x in falsen_list:
    wr2.writerow([x])

Unfortunately, this throws an error:

  Traceback (most recent call last):
  File "/home/noname365/PycharmProjects/MLCorpusBlacklist/CorpusML_training.py", line 171, in <module>
    falsen_list = X_test[(Y_test == 'blacklisted') and (pred == 'clean')] #False Negatives
  File "/home/noname365/virtualenvs/env35/lib/python3.5/site-packages/pandas/core/generic.py", line 731, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Am I on the right track here?

回答1:

Y_test != pred gives you the answers you predicted incorrectly Partciularly (Y_test == 1) == (pred == 0) should give you the false positives and (Y_test == 0) == (pred == 1) should give you the false negatives (or it could be the other way around depending on what is positive and negative in your setup)



回答2:

For me this worked adding '&' at the place of '==' in Alessandro's answer(His answer gave both false positives and false negatives together)

(Y_test == 1) & (pred == 0)

Hope it helps..