OpenAI baselines: Why simultaneously use `tf.stop_

2019-07-13 07:08发布

In OpenAI baselines code on DQN, tf.stop_gradient is used on the q values of the target network during building the operation graph to prevent the contributions of the target q values to the minimization of the loss. (line 213)

However, when calling minimize, the var_list is specified as only the tf.Variable with scope that falls under the q network being optimized, excluding the variables with scope under the target q network. (line 223)

I'm not sure why they do both. The two approaches seem to achieve the same result.

标签： machine-learning tensorflow openai-gym

1条回答

神经病院院长

2楼-- · 2019-07-13 07:31

It's redundant. IMO code reads better - you know that gradient will not flow through that expression, and also you know exactly which variables will be affected.

One would indeed suffice to achieve equivalent effect.

0人赞添加讨论(0) 举报

OpenAI baselines: Why simultaneously use `tf.stop_

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间