How to deal with mutli-label classification which has imbalanced results while training neural networks ? One of the solutions that I came across was to penalize the error for rare labeled classes. Here is what how i designed the network :
Number of classes: 100. Input layer, 1st hidden layer and 2nd layer(100) are fully-connected with drop-outs and ReLU. The output of the 2nd hidden layer is py_x.
cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=py_x, labels=Y))
Where Y is a modified version of one-hot-encoding with values between 1 to 5 set for all the labels of a sample. The value would ~1 for the most frequent label and ~5 for rarest labels. The value are not discrete, i.e new value to be set of a label in the one-hot-encoding is based on the formula
= 1 + 4*(1-(percentage of label/100))
For example: <0, 0, 1, 0, 1, .... > would be converted to something like <0, 0, 1.034, 0, 3.667, ...> . NOTE : only the values of 1 in the original vectors are changed.
This way if the model incorrectly predicts a rare label its error would be high, for ex: 0.0001 - 5 = -4.9999, and this would back-propagate a heavier error as compared to a mislabeling of very frequent label.
Is this the right way to penalize ? Are there any better methods to deal with this problem ?
Let's answer your problem in the general form. What you are facing is the class imbalance problem and there are many ways to tackle this problem. Common ways are:
For example, if you have 5 target classes(class A to E), and class A, B, C, and D have 1000 examples each and class E has 10 examples, you can simply add 990 more examples from class E(just copy it or copy and some noise to it).
This is the method you have used in your code where you increased the importance(weight) of a class by a factor of at most 5.
Returning to your problem, the first solution is independent of your model. You just need to check if you are able to change the dataset(add more samples to classes with fewer samples or remove samples from classes with lots of samples). For the second solution, since you are working with a neural network, you have to change your loss function formula. You can define multiple hyperparameters(class weights or importance) and train your model and see which set of parameters work better.
So to answer your question, yes this is a right way to penalize but maybe you get better accuracy by trying different weights(instead of 5 in your example). Also, you might want to try dataset resampling.
For more information, you can refer to this link.