OpenAI baselines: Why simultaneously use `tf.stop_

2019-07-13 07:08发布

In OpenAI baselines code on DQN, tf.stop_gradient is used on the q values of the target network during building the operation graph to prevent the contributions of the target q values to the minimization of the loss. (line 213)

However, when calling minimize, the var_list is specified as only the tf.Variable with scope that falls under the q network being optimized, excluding the variables with scope under the target q network. (line 223)

I'm not sure why they do both. The two approaches seem to achieve the same result.

1条回答
神经病院院长
2楼-- · 2019-07-13 07:31

It's redundant. IMO code reads better - you know that gradient will not flow through that expression, and also you know exactly which variables will be affected.

One would indeed suffice to achieve equivalent effect.

查看更多
登录 后发表回答