I am trying to learn the parameters of a simple discrete HMM using PyMC. I am modeling the rainy-sunny model from the Wiki page on HMM. The model looks as follows:
I am using the following priors.
theta_start_state ~ beta(20,10)
theta_transition_rainy ~beta(8,2)
theta_transition_sunny ~beta(2,8)
theta_emission_rainy ~ Dirichlet(3,4,3)
theta_emission_sunny ~ Dirichlet(10,6,4)
Initially, I use this setup to create a training set as follows.
## Some not so informative priors!
# Prior on start state
theta_start_state = pm.Beta('theta_start_state',12,8)
# Prior on transition from rainy
theta_transition_rainy = pm.Beta('transition_rainy',8,2)
# Prior on transition from sunny
theta_transition_sunny = pm.Beta('transition_sunny',2,8)
# Prior on emission from rainy
theta_emission_rainy = pm.Dirichlet('emission_rainy',[3,4,3])
# Prior on emission from sunny
theta_emission_sunny = pm.Dirichlet('emission_sunny',[10,6,4])
# Start state
x_train_0 = pm.Categorical('x_0',[theta_start_state, 1-theta_start_state])
N = 100
# Create a train set for hidden states
x_train = np.empty(N, dtype=object)
# Creating a train set of observations
y_train = np.empty(N, dtype=object)
x_train[0] = x_train_0
for i in xrange(1, N):
if x_train[i-1].value==0:
x_train[i] = pm.Categorical('x_train_%d'%i,[theta_transition_rainy, 1- theta_transition_rainy])
else:
x_train[i] = pm.Categorical('x_train_%d'%i,[theta_transition_sunny, 1- theta_transition_sunny])
for i in xrange(0,N):
if x_train[i].value == 0:
# Rain
y_train[i] = pm.Categorical('y_train_%d' %i, theta_emission_rainy)
else:
y_train[i] = pm.Categorical('y_train_%d' %i, theta_emission_sunny)
However, I am not able to understand how to learn these parameters using PyMC. I made a start as follows.
@pm.observed
def y(x=x_train, value =y_train):
N = len(x)
out = np.empty(N, dtype=object)
for i in xrange(0,N):
if x[i].value == 0:
# Rain
out[i] = pm.Categorical('y_%d' %i, theta_emission_rainy)
else:
out[i] = pm.Categorical('y_%d' %i, theta_emission_sunny)
return out
The complete notebook containing this code can be found here.
Aside: The gist containing HMM code for a Gaussian is really hard to understand! (not documented)
Update
Based on the answers below, I tried changing my code as follows:
@pm.stochastic(observed=True)
def y(value=y_train, hidden_states = x_train):
def logp(value, hidden_states):
logprob = 0
for i in xrange(0,len(hidden_states)):
if hidden_states[i].value == 0:
# Rain
logprob = logprob + pm.categorical_like(value[i], theta_emission_rainy)
else:
# Sunny
logprob = logprob + pm.categorical_like(value[i], theta_emission_sunny)
return logprob
The next step would be to create a model and then run the MCMC algorithm. However, the above
edited code would also not work. It gives a ZeroProbability error
.
I am not sure if I have interpreted the answers correctly.
The first thing that pops out at me is the return value of your likelihood. PyMC expects a scalar return value, not a list/array. You need to sum the array before returning it.
Also, when you use a Dirichlet as a prior for the Categorical, PyMC detects this and fills in the last probability. Here's how I would code your
x_train
/y_train
loops:So, you grab the appropriate probabilities with a Lambda, and use it as the argument for the Categorical.
Just some thoughts on this:
In your likelihood function, you need to sum (for all time steps t) the log-probability of the observed values (of walk, shop and clean respectively) given the current (sampled) values of rainy/sunny (i.e., Weather) at the same time step t.
Learning
If you want to learn parameters for the model, you might want to consider switching to PyMC3 which would be better suited for automatically computing gradients for your logp function. But in this case (since you chose conjugate priors) this is not really neccessary. If you don't know what Conjugate Priors are, or are in need of an overview, ask Wikipedia for List of Conjugate Priors, it has a great article on that.
Depending on what you want to do, you have a few choices here. If you want to sample from the posterior distribution of all parameters, just specify your MCMC model as you did, and press the inference button, after that just plot and summarize the marginal distributions of the parameters you're interested in, and you are done.
If you are not interested in marginal posterior distributions, but rather in finding the joint MAP paramters, you might consider using Expectation Maximization (EM) learning or Simulated Annealing. Both should work reasonably well within the MCMC Framework.
For EM Learning simply repeat these steps until convergence:
I would use a small learning rate factor so you don't fall into the first local optimum (now we're approaching Simulated Annealing): Instead of treating the N samples you generated from the MCMC chain as actual observations, treat them as K observations (for a value K << N) by scaling the updates to the hyperparameters down by a learning rate factor of K/N.