When I set epsilon=10e-8
, AdamOptimizer
doesn't work. When I set it to 1, it works just fine.
相关问题
- How to use Reshape keras layer with two None dimen
- How to conditionally scale values in Keras Lambda
- neural network does not learn (loss stays the same
- Trying to understand Pytorch's implementation
- Convolutional Neural Network seems to be randomly
相关文章
- how to flatten input in `nn.Sequential` in Pytorch
- How to downgrade to cuda 10.0 in arch linux?
- How to use cross_val_score with random_state
- Looping through training data in Neural Networks B
- Why does this Keras model require over 6GB of memo
- How to measure overfitting when train and validati
- McNemar's test in Python and comparison of cla
- How to disable keras warnings?
The epsilon is to avoid divide by zero error in the above equation while updating the variable when the gradient is almost zero. So, ideally epsilon should be a small value. But, having a small epsilon in the denominator will make larger weight updates and with subsequent normalization larger weights will always be normalized to 1.
So, I guess when you train with small epsilon the optimizer will become unstable.
The trade-off is that the bigger you make epsilon (and the denominator), the smaller the weight updates are and thus slower the training progress will be. Most times you want the denominator to be able to get small. Usually, the epsilon value greater than 10e-4 performs better.