My output of neural network is table of predicted class probabilities for multi-label classification:
print(probabilities)
| | 1 | 3 | ... | 8354 | 8356 | 8357 |
|---|--------------|--------------|-----|--------------|--------------|--------------|
| 0 | 2.442745e-05 | 5.952136e-06 | ... | 4.254002e-06 | 1.894523e-05 | 1.033957e-05 |
| 1 | 7.685694e-05 | 3.252202e-06 | ... | 3.617730e-06 | 1.613792e-05 | 7.356643e-06 |
| 2 | 2.296657e-06 | 4.859554e-06 | ... | 9.934525e-06 | 9.244772e-06 | 1.377618e-05 |
| 3 | 5.163169e-04 | 1.044035e-04 | ... | 1.435158e-04 | 2.807420e-04 | 2.346930e-04 |
| 4 | 2.484626e-06 | 2.074290e-06 | ... | 9.958628e-06 | 6.002510e-06 | 8.434519e-06 |
| 5 | 1.297477e-03 | 2.211737e-04 | ... | 1.881772e-04 | 3.171079e-04 | 3.228884e-04 |
I converted it to class labels using a threshold (0.2) for measuring accuraccy of my prediction:
predictions = (probabilities > 0.2).astype(np.int)
print(predictions)
| | 1 | 3 | ... | 8354 | 8356 | 8357 |
|---|---|---|-----|------|------|------|
| 0 | 0 | 0 | ... | 0 | 0 | 0 |
| 1 | 0 | 0 | ... | 0 | 0 | 0 |
| 2 | 0 | 0 | ... | 0 | 0 | 0 |
| 3 | 0 | 0 | ... | 0 | 0 | 0 |
| 4 | 0 | 0 | ... | 0 | 0 | 0 |
| 5 | 0 | 0 | ... | 0 | 0 | 0 |
Also I have a test set:
print(Y_test)
| | 1 | 3 | ... | 8354 | 8356 | 8357 |
|---|---|---|-----|------|------|------|
| 0 | 0 | 0 | ... | 0 | 0 | 0 |
| 1 | 0 | 0 | ... | 0 | 0 | 0 |
| 2 | 0 | 0 | ... | 0 | 0 | 0 |
| 3 | 0 | 0 | ... | 0 | 0 | 0 |
| 4 | 0 | 0 | ... | 0 | 0 | 0 |
| 5 | 0 | 0 | ... | 0 | 0 | 0 |
Question: How to build an algorithm in Python that will choose the optimal threshold that maximize roc_auc_score(average = 'micro')
or another metrics?
Maybe it is possible to build manual function in Python that optimize threshold, depending on the accuracy metric.