My output of neural network is table of predicted class probabilities for multi-label classification:
print(probabilities)
| | 1 | 3 | ... | 8354 | 8356 | 8357 |
|---|--------------|--------------|-----|--------------|--------------|--------------|
| 0 | 2.442745e-05 | 5.952136e-06 | ... | 4.254002e-06 | 1.894523e-05 | 1.033957e-05 |
| 1 | 7.685694e-05 | 3.252202e-06 | ... | 3.617730e-06 | 1.613792e-05 | 7.356643e-06 |
| 2 | 2.296657e-06 | 4.859554e-06 | ... | 9.934525e-06 | 9.244772e-06 | 1.377618e-05 |
| 3 | 5.163169e-04 | 1.044035e-04 | ... | 1.435158e-04 | 2.807420e-04 | 2.346930e-04 |
| 4 | 2.484626e-06 | 2.074290e-06 | ... | 9.958628e-06 | 6.002510e-06 | 8.434519e-06 |
| 5 | 1.297477e-03 | 2.211737e-04 | ... | 1.881772e-04 | 3.171079e-04 | 3.228884e-04 |
I converted it to class labels using a threshold (0.2) for measuring accuraccy of my prediction:
predictions = (probabilities > 0.2).astype(np.int)
print(predictions)
| | 1 | 3 | ... | 8354 | 8356 | 8357 |
|---|---|---|-----|------|------|------|
| 0 | 0 | 0 | ... | 0 | 0 | 0 |
| 1 | 0 | 0 | ... | 0 | 0 | 0 |
| 2 | 0 | 0 | ... | 0 | 0 | 0 |
| 3 | 0 | 0 | ... | 0 | 0 | 0 |
| 4 | 0 | 0 | ... | 0 | 0 | 0 |
| 5 | 0 | 0 | ... | 0 | 0 | 0 |
Also I have a test set:
print(Y_test)
| | 1 | 3 | ... | 8354 | 8356 | 8357 |
|---|---|---|-----|------|------|------|
| 0 | 0 | 0 | ... | 0 | 0 | 0 |
| 1 | 0 | 0 | ... | 0 | 0 | 0 |
| 2 | 0 | 0 | ... | 0 | 0 | 0 |
| 3 | 0 | 0 | ... | 0 | 0 | 0 |
| 4 | 0 | 0 | ... | 0 | 0 | 0 |
| 5 | 0 | 0 | ... | 0 | 0 | 0 |
Question: How to build an algorithm in Python that will choose the optimal threshold that maximize roc_auc_score(average = 'micro')
or another metrics?
Maybe it is possible to build manual function in Python that optimize threshold, depending on the accuracy metric.
I assume your groundtruth labels are
Y_test
and predictions arepredictions
.Optimizing
roc_auc_score(average = 'micro')
according to a predictionthreshold
does not seem to make sense as AUCs are computed based on how predictions are ranked and therefore needpredictions
as float values in[0,1]
.Therefore, I will discuss
accuracy_score
.You could use
scipy.optimize.fmin
:the best way to do so is to put a logistic regression on top of your new dataset. It will multiply every probability by a certain constant and thus will provide an automatic threshold on the output (with the LR you just need to predict the class not the probabilities)
You need to train this by subdividing the Test set in two and use one part to train the LR after predicting the output with the NN.
This is not the only way to do it, but it works fine for me everytime.
we have X_train_nn,X_valid_nn,X_test_NN and we subdivide X_test_NN in X_train_LR, X_test_LR (or do a Stratified Kfold as you wish) here is a sample of the code
You condider you output as a new dataset and train a LR on this new dataset.