Is there a way for me to implement openai environment, where the action space changes at each step?
问题:
回答1:
Yes (though some of the premade agents may not work in this case).
@property
def action_space(self):
# Do some code here to calculate the available actions
return Something
The @property
decorator is so that you can fit the standard format for a gym environment, where the action_space is a property env.action_space
rather than a method env.action_space()
.
回答2:
You could implement your own Space descendant class and override the shape(), sample() and contains() methods to return values consistent with the updated available actions. Your environment then returns an instance of your custom class for the action_space, which you can update from within the environment on each step.
This can be done through additional methods which you provide e.g. disable_actions() and enable_actions() as follows:
import gym import numpy as np #You could also inherit from Discrete or Box here and just override the shape(), sample() and contains() methods class Dynamic(gym.Space): """ x where x in available actions {0,1,3,5,...,n-1} Example usage: self.action_space = spaces.Dynamic(max_space=2) """ def __init__(self, max_space): self.n = max_space #initially all actions are available self.available_actions = range(0, max_space) def disable_actions(self, actions): """ You would call this method inside your environment to remove available actions""" self.available_actions = [action for action in self.available_actions if action not in actions] return self.available_actions def enable_actions(self, actions): """ You would call this method inside your environment to enable actions""" self.available_actions = self.available_actions.append(actions) return self.available_actions def sample(self): return np.random.choice(self.available_actions) def contains(self, x): return x in self.available_actions @property def shape(self): """"Return the new shape here"""" return () def __repr__(self): return "Dynamic(%d)" % self.n def __eq__(self, other): return self.n == other.n
You could also restrict the actions in the agent and only allow it to consider valid actions, but this will hinder the use of existing general purpose agents.
I found this link that explains it very well (too long to quote here) How do I let AI know that only some actions are available during specific states in reinforcement learning?