How does the epsilon hyperparameter affect tf.trai

2019-05-21 04:22发布

When I set epsilon=10e-8, AdamOptimizer doesn't work. When I set it to 1, it works just fine.

标签： machine-learning neural-network deep-learning epsilon

1条回答

2楼-- · 2019-05-21 05:07

t <- t + 1

lr_t <- learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t)

m_t <- beta1 * m_{t-1} + (1 - beta1) * g

v_t <- beta2 * v_{t-1} + (1 - beta2) * g * g

where g is gradient

variable <- variable - lr_t * m_t / (sqrt(v_t) + epsilon)

The epsilon is to avoid divide by zero error in the above equation while updating the variable when the gradient is almost zero. So, ideally epsilon should be a small value. But, having a small epsilon in the denominator will make larger weight updates and with subsequent normalization larger weights will always be normalized to 1.

So, I guess when you train with small epsilon the optimizer will become unstable.

The trade-off is that the bigger you make epsilon (and the denominator), the smaller the weight updates are and thus slower the training progress will be. Most times you want the denominator to be able to get small. Usually, the epsilon value greater than 10e-4 performs better.

The default value of 1e-8 for epsilon might not be a good default in general. For example, when training an Inception network on ImageNet a current good choice is 1.0 or 0.1. check here

0人赞添加讨论(0) 举报

How does the epsilon hyperparameter affect tf.trai

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间