Why should continuous actions be clamped?

2019-07-24 17:29发布

In Deep Reinforcement Learning, using continuous action spaces, why does it seem to be common practice to clamp the action right before the agent's execution?

Examples:

OpenAI Gym Mountain Car https://github.com/openai/gym/blob/master/gym/envs/classic_control/continuous_mountain_car.py#L57

Unity 3DBall https://github.com/Unity-Technologies/ml-agents/blob/master/unity-environment/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAgent.cs#L29

Isn't information lost doing so? Like if the model outputs +10 for velocity (moving), which is then clamped to +1, the action itself behaves rather discrete (concerning its mere execution). For a fine grained movement, wouldn't it make more sense to multiply the output by something like 0.1?

标签： deep-learning reinforcement-learning continuous

1条回答

叛逆

2楼-- · 2019-07-24 18:05

This is probably simply done to enforce constraints on what the agent can do. Maybe the agent would like to put out an action that increases velocity by 1,000,000. But if the agent is a self-driving car with a weak engine that can at most accelerate by 1 unit, we don't care if the agent would hypothetically like to accelerate by more units. The car's engine has limited capabilities.

0人赞添加讨论(0) 举报

Why should continuous actions be clamped?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间