Scikit-learn: How to obtain True Positive, True Ne

I am new in machine learning and in scikit-learn.

My problem:

(Please, correct any type of missconception)

I have a dataset which is a BIG JSON, I retrieve it and store it in a trainList variable.

I pre-process it in order to be able to work with it.

Once I have done that, I start the classification:

I use kfold cross validation method in order to obtain the mean accuracy and I train a classifier.
I make the predicctions and I obtain the accuracy and confusion matrix of that fold.
After this, I would like to obtain the True Positive(TP), True Negative(TN), False Positive(FP) and False Negative(FN) values. I would use these paramters to obtain the Sensitivity and the specificity and I would them and the total of the TPs to a HTML in order to show a chart with the TPs of each label.

Code:

The variables I have for the moment:

trainList #It is a list with all the data of my dataset in JSON form
labelList #It is a list with all the labels of my data

Most part of the method:

#I transform the data from JSON form to a numerical one
X=vec.fit_transform(trainList)

#I scale the matrix (don't know why but without it, it makes an error)
X=preprocessing.scale(X.toarray())

#I generate a KFold in order to make cross validation
kf = KFold(len(X), n_folds=10, indices=True, shuffle=True, random_state=1)

#I start the cross validation
for train_indices, test_indices in kf:
    X_train=[X[ii] for ii in train_indices]
    X_test=[X[ii] for ii in test_indices]
    y_train=[listaLabels[ii] for ii in train_indices]
    y_test=[listaLabels[ii] for ii in test_indices]

    #I train the classifier
    trained=qda.fit(X_train,y_train)

    #I make the predictions
    predicted=qda.predict(X_test)

    #I obtain the accuracy of this fold
    ac=accuracy_score(predicted,y_test)

    #I obtain the confusion matrix
    cm=confusion_matrix(y_test, predicted)

    #I should calculate the TP,TN, FP and FN 
    #I don't know how to continue

标签： python machine-learning scikit-learn classification supervised-learning

11条回答

冷血范

2楼-- · 2019-01-21 03:14

you can try sklearn.metrics.classification_report as below:

import sklearn
y_true = [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0]
y_pred = [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0]

print sklearn.metrics.classification_report(y_true, y_pred)

output:

         precision    recall  f1-score   support

      0       0.80      0.57      0.67         7
      1       0.50      0.75      0.60         4

      avg / total       0.69      0.64      0.64        11

0人赞添加讨论(0) 举报

淡お忘

3楼-- · 2019-01-21 03:20

I wrote a version that works using only numpy. I hope it helps you.

import numpy as np

def perf_metrics_2X2(yobs, yhat):
    """
    Returns the specificity, sensitivity, positive predictive value, and 
    negative predictive value 
    of a 2X2 table.

    where:
    0 = negative case
    1 = positive case

    Parameters
    ----------
    yobs :  array of positive and negative ``observed`` cases
    yhat : array of positive and negative ``predicted`` cases

    Returns
    -------
    sensitivity  = TP / (TP+FN)
    specificity  = TN / (TN+FP)
    pos_pred_val = TP/ (TP+FP)
    neg_pred_val = TN/ (TN+FN)

    Author: Julio Cardenas-Rodriguez
    """
    TP = np.sum(  yobs[yobs==1] == yhat[yobs==1] )
    TN = np.sum(  yobs[yobs==0] == yhat[yobs==0] )
    FP = np.sum(  yobs[yobs==1] == yhat[yobs==0] )
    FN = np.sum(  yobs[yobs==0] == yhat[yobs==1] )

    sensitivity  = TP / (TP+FN)
    specificity  = TN / (TN+FP)
    pos_pred_val = TP/ (TP+FP)
    neg_pred_val = TN/ (TN+FN)

    return sensitivity, specificity, pos_pred_val, neg_pred_val

0人赞添加讨论(0) 举报

一夜七次

4楼-- · 2019-01-21 03:20

The one liner to get true postives etc. out of the confusion matrix is to ravel it:

from sklearn.metrics import confusion_matrix

y_true = [1, 1, 0, 0]
y_pred = [1, 0, 1, 0]   

tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
print(tn, fp, fn, tp)  # 1 1 1 1

0人赞添加讨论(0) 举报

戒情不戒烟

5楼-- · 2019-01-21 03:24

You can obtain all of the parameters from the confusion matrix. The structure of the confusion matrix(which is 2X2 matrix) is as follows

TP|FP
FN|TN

TP = cm[0][0]
FP = cm[0][1]
FN = cm[1][0]
TN = cm[1][1]

More details at https://en.wikipedia.org/wiki/Confusion_matrix

0人赞添加讨论(0) 举报

可以哭但决不认输i

6楼-- · 2019-01-21 03:28

I think both of the answers are not fully correct. For example, suppose that we have the following arrays;
y_actual = [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0]

y_predic = [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0]

If we compute the FP, FN, TP and TN values manually, they should be as follows:

FP: 3 FN: 1 TP: 3 TN: 4

However, if we use the first answer, results are given as follows:

FP: 1 FN: 3 TP: 3 TN: 4

They are not correct, because in the first answer, False Positive should be where actual is 0, but the predicted is 1, not the opposite. It is also same for False Negative.

And, if we use the second answer, the results are computed as follows:

FP: 3 FN: 1 TP: 4 TN: 3

True Positive and True Negative numbers are not correct, they should be opposite.

Am I correct with my computations? Please let me know if I am missing something.

0人赞添加讨论(0) 举报

Scikit-learn: How to obtain True Positive, True Ne

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间