I am attempting to create a custom environment for reinforcement learning with openAI gym. I need to represent all possible values that the environment will see in a variable called observation_space
. There are 3 possible actions for the agent to use called action_space
To be more specific the observation_space
is a temperature sensor which will see possible ranges from 50 to 150 degrees and I think I can represent all of this by:
EDIT, I had the action_space numpy array wrong
import numpy as np
action_space = np.array([ 0, 1, 2])
observation_space = np.arange(50,150,1)
Is there a better method that I could use for the observation_space
where I could bin the data? IE, make 20 bins 50-55, 55-60, 60-65, etc...
I think what I have will work but seems sort of cumbersome... And I am sure there is a better practice as there is not a lot of wisdom on my end this subject. This will print out a Q table:
action_size = action_space.shape[0]
state_size = observation_space.shape[0]
qtable = np.zeros((state_size, action_size))
print(qtable)
This is not really related to programming, so maybe on stats.stackexchange you may get better answers. Anyway, it just depends on how much accuracy you want. I guess you want to change the temperature (increase, decrease, don't change) according to the sensor readings. Is there much different (in terms of optimal action) between 50 and 51? If not, then you can discretize the state space every 2 degrees. And so on.
More generally, doing so you are using what in RL are called "features". A discretization over an interval of the state space is called tile coding and usually works well.
If you are new to RL, I really advise to read this book, or at least Chapters 1,3,4 which are related to what you are doing.