What is a policy in reinforcement learning? [close

I've seen such words as:

A policy defines the learning agent's way of behaving at a given time. Roughly speaking, a policy is a mapping from perceived states of the environment to actions to be taken when in those states.

But still didn't fully understand. What exactly is a policy in reinforcement learning?

标签： terminology reinforcement-learning

3条回答

霸刀☆藐视天下

2楼-- · 2019-03-13 22:12

The definition is correct, though not instantly obvious if you see it for the first time. Let me put it this way: a policy is an agent's strategy.

For example, imagine a world where a robot moves across the room and the task is to get to the target point (x, y), where it gets a reward. Here:

A room is an environment
Robot's current position is a state
A policy is what an agent does to accomplish this task:
- dumb robots just wander around randomly until they accidentally end up in the right place (policy #1)
- others may, for some reason, learn to go along the walls most of the route (policy #2)
- smart robots plan the route in their "head" and go straight to the goal (policy #3)

Obviously, some policies are better than others, and there are multiple ways to assess them, namely state-value function and action-value function. The goal of RL is to learn the best policy. Now the definition should make more sense (note that in the context time is better understood as a state):

A policy defines the learning agent's way of behaving at a given time.

Formally

More formally, we should first define Markov Decision Process (MDP) as a tuple (S, A, P, R, y), where:

S is a finite set of states
A is a finite set of actions
P is a state transition probability matrix (probability of ending up in a state for each current state and each action)
R is a reward function, given a state and an action
y is a discount factor, between 0 and 1

Then, a policy π is a probability distribution over actions given states. That is the likelihood of every action when an agent is in a particular state (of course, I'm skipping a lot of details here). This definition corresponds to the second part of your definition.

I highly recommend David Silver's RL course available on YouTube. The first two lectures focus particularly on MDPs and policies.

0人赞添加讨论(0) 举报

smile是对你的礼貌

3楼-- · 2019-03-13 22:15

Here is a succinct answer: a policy is the 'thinking' of the agent. It's the mapping of when you are in some state s, which action a should the agent take now? You can think of policies as a lookup table:

state----action----probability/'goodness' of taking the action
  1         1                     0.6
  1         2                     0.4
  2         1                     0.3
  2         2                     0.7

If you are in state 1, you'd (assuming a greedy strategy) pick action 1. If you are in state 2, you'd pick action 2.

0人赞添加讨论(0) 举报

冷血范

4楼-- · 2019-03-13 22:32

In plain words, in the simplest case, a policy π is a function that takes as input a state s and returns an action a. That is: π(s) → a

In this way, the policy is typically used by the agent to decide what action a should be performed when it is in a given state s.

Sometimes, the policy can be stochastic instead of deterministic. In such a case, instead of returning a unique action a, the policy returns a probability distribution over a set of actions.

In general, the goal of any RL algorithm is to learn an optimal policy that achieve a specific goal.

0人赞添加讨论(0) 举报

What is a policy in reinforcement learning? [close

Formally

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间