Relationship between loss and accuracy

2020-02-25 23:46发布

问题:

Is it practically possible to have decreasing loss and decreasing accuracy at each epoch when training a CNN model? I am getting the below result while training.

Can someone explain the possible reasons why this is happening?

回答1:

There are at least 5 reasons which might cause such behavior:

  1. Outliers: imagine that you have 10 exactly the same images and 9 out of them belong to class A and one belongs to class B. In this case, a model will start to assign a high probability of class A to this example because of the majority of examples. But then - a signal from outlier might destabilize model and make accuracy decreasing. In theory, a model should stabilize at assigning score 90% to class A but it might last many epochs.

    Solutions: In order to deal with such examples I advise you to use gradient clipping (you may add such option in your optimizer). If you want to check if this phenomenon occurs - you may check your losses distributions (losses of individual examples from training set) and look for outliers.

  2. Bias: Now imagine that you have 10 exactly the same images but 5 of them have assigned class A and 5 - class B. In this case, a model will try to assign approximately 50%-50% distribution on both of these classes. Now - your model can achieve at most 50% of accuracy here - choosing one class out of two valid.

    Solution: Try to increase the model capacity - very often you have a set of really similar images - adding expressive power might help to discriminate similar examples. Beware of overfitting though. Another solution is to try this strategy in your training. If you want to check if such phenomenon occurs - check the distribution of losses of individual examples. If a distribution would be skewed toward higher values - you are probably suffering from bias.

  3. Class inbalance: Now imagine that 90% of your images belong to class A. In an early stage of your training, your model is mainly concentrating on assigning this class to almost all of examples. This might make individual losses to achieve really high values and destabilize your model by making a predicted distribution more unstable.

    Solution: once again - gradient clipping. Second thing - patience, try simply leaving your model for more epochs. A model should learn more subtle in a further phase of training. And of course - try class balancing - by either assigning sample_weights or class_weights. If you want to check if this phenomenon occurs - check your class distribution.

  4. Too strong regularization: if you set your regularization to be too strict - a training process is mainly concentrated on making your weights to have smaller norm than actually learning interesting insights.

    Solution: add a categorical_crossentropy as a metric and observe if it's also decreasing. If not - then it means that your regularization is too strict - try to assign less weight penalty then.

  5. Bad model design - such behavior might be caused by a wrong model design. There are several good practices which one might apply in order to improve your model:

    Batch Normalization - thanks to this technique you are preventing your model from radical changes of inner network activations. This makes training much more stable and efficient. With a small batch size, this might be also a genuine way of regularizing your model.

    Gradient clipping - this makes your model training much more stable and efficient.

    Reduce bottleneck effect - read this fantastic paper and check if your model might suffer from bottleneck problem.

    Add auxiliary classifiers - if you are training your network from scratch - this should make your features much more meaningful and your training - faster and more efficient.



回答2:

Yes, this is possible.

To provide an intuitive example of why this might happen, suppose that your classifier outputs roughly the same probability for classes A and B, and class A has the highest density overall. Within this setting, changing the model’s parameters minimally might turn B into the most probable class. This effect would make the cross-entropy loss vary minimally, since it depends directly on the probability distribution, but the change would be clearly noticed for the accuracy, because it depends on the argmax of the output probability distribution.

As a conclusion, minimizing the cross-entropy loss does not always imply improving the accuracy, mainly because cross-entropy is a smooth function, while the accuracy is non-smooth.



回答3:

It is possible to get decreasing loss with decreasing accuracy but it is far from being called as a good model. This problem can be resolve up to some extinct using Batch normalization at every conv layer of model.



回答4:

this could be possible because loss function also accounts for the confidence of prediction but accuracy only accounts for correctness. Following excel sheet shows an example, on the left side loss and accuracy are low on right side accuracy increases at the same time loss is also increased

check spreadsheet to try it yourself

This is same with multi-class classification using softmax function

  • softmax-cross-entropy as loss function

  • Loss will be low if probability for positive class is high

  • When predicting we select the class which has highest probability
  • So model can learn to increase accuracy even with low confidence for selected class as long as it is bigger that probability for other class. Doing this will increase loss but accuracy can be increased.
  • But for training set loss is always decreasing trend (it can fluctuate if using batch GD)

Hope this makes it clear why this is possible. This is my intuition, if someone this this is not correct you feedback is welcomed