What pymc3 Monte-Carlo stepper can I use for a cus

I am working on implementing hidden-Markov-Chains in pymc3. I have gotten pretty far in implementing the hidden states. Below, I am showing a simple 2-state Markov-chain:

import numpy as np
import pymc3 as pm
import theano.tensor as tt

# Markov chain sample with 2 states that was created
# to have prob 0->1 = 0.1 and prob 1->0 = 0.3
sample = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
   0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0,
   1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1,
   0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0],
dtype=np.uint8)

I am now defining a class that describes the states. As input, I need to know the probability P1 to move from state 0 to state 1, and P2 to move from 1->0. I also need to know the probability PA for the first state being 0.

class HMMStates(pm.Discrete):
    """
    Hidden Markov Model States
    Parameters
    ----------
    P1 : tensor
        probability to remain in state 1
    P2 : tensor
        probability to move from state 2 to state 1

    """

    def __init__(self, PA=None, P1=None, P2=None,
                 *args, **kwargs):
        super(HMMStates, self).__init__(*args, **kwargs)
        self.PA = PA
        self.P1 = P1
        self.P2 = P2
        self.mean = 0.
        self.mode = tt.cast(0,dtype='int64')

    def logp(self, x):
        PA = self.PA
        P1 = self.P1
        P2 = self.P2

        # now we need to create an array with probabilities
        # so that for x=A: PA=P1, PB=(1-P1)
        # and for x=B: PA=P2, PB=(1-P2)

        choice = tt.stack((P1,P2))
        P = choice[x[:-1]]

        x_i = x[1:]

        ou_like = pm.Categorical.dist(P).logp(x_i)
        return pm.Categorical.dist(PA).logp(x[0]) + tt.sum(ou_like)

I am pretty proud of the advanced indexing ninja tricks that I learned on the theano google group. You can also implement the same with a tt.switch. Something I was not too sure about is the self.mode. I just gave it 0 to avoid a testvalue error. Here is how to use the class in a model that tests whether it works. In this case the state is not hidden, but observed.

with pm.Model() as model:
    # 2 state model
    # P1 is probablility to stay in state 1
    # P2 is probability to move from state 2 to state 1
    P1 = pm.Dirichlet('P1', a=np.ones(2))
    P2 = pm.Dirichlet('P2', a=np.ones(2))

    PA = pm.Deterministic('PA',P2/(P2+1-P1))

    states = HMMStates('states',PA,P1,P2, observed=sample)

    start = pm.find_MAP()

    trace = pm.sample(5000, start=start)

the output reproduces the data nicely. In the next model, I will show the problem. Here I do not directly observe the states but the state with some Gaussian noise added (thus hidden state). If you run the model with a Metropolis stepper then it crashes with an index error, which I traced back to problems related to using the Metropolis stepper on Categorical Distributions. Unfortunately, the only Stepper that would apply to my class is the CategoricalGibbsMetropolis stepper, but it refuses to work with my class since it is not explicitly a Categorial Distribution.

gauss_sample = sample*1.0 + 0.1*np.random.randn(len(sample))
from scipy import optimize
with pm.Model() as model2:
    # 2 state model
    # P1 is probablility to stay in state 1
    # P2 is probability to move from state 2 to state 1
    P1 = pm.Dirichlet('P1', a=np.ones(2))
    P2 = pm.Dirichlet('P2', a=np.ones(2))

    S = pm.InverseGamma('S',alpha=2.1, beta=1.1)

    PA = pm.Deterministic('PA',P2/(P2+1-P1))

    states = HMMStates('states',PA,P1,P2, shape=len(gauss_sample))

    emission = pm.Normal('emission',
                         mu=tt.cast(states,dtype='float64'),
                         sd=S,
                         observed = gauss_sample)

    start2 = pm.find_MAP(fmin=optimize.fmin_powell)
    step1 = pm.Metropolis(vars=[P1, P2, S, PA, emission])
    step2 = pm.ElemwiseCategorical(vars=[states], values=[0,1])
    trace2 = pm.sample(10000, start=start, step=[step1,step2])

The ElemwiseCategorical makes it run, but does not assign the correct value for my states. The states are either all 0, or all 1s.

How can I tell the ElemwiseCategorial to assign a vector of states of 1s and 0s, or alternatively how can I get the CategorialGibbsMetropolis to recognize my distribution as categorical. This must be a common problem with custom distributions.

Since I have not heard from anyone on my question, I answered it myself. The trick that I used was suggested by Chris Fonnesbeck on pymc3 github where I opened the issue. He suggested to subclass pm.Categorical.

class HMMStates(pm.Categorical):
    """
    Hidden Markov Model States
    Parameters
    ----------
    P1 : tensor
        probability to remain in state 1
    P2 : tensor
        probability to move from state 2 to state 1

    """

    def __init__(self, PA=None, P1=None, P2=None,
                 *args, **kwargs):
        super(pm.Categorical, self).__init__(*args, **kwargs)
        self.PA = PA
        self.P1 = P1
        self.P2 = P2
        self.k = 2 # two state model
        self.mean = 0.
        self.mode = tt.cast(0,dtype='int64')

    def logp(self, x):
        PA = self.PA
        P1 = self.P1
        P2 = self.P2

        # now we need to create an array with probabilities
        # so that for x=A: PA=P1, PB=(1-P1)
        # and for x=B: PA=P2, PB=(1-P2)
        PT = tt.stack((P1,P2))

        P = PT[x[:-1]]

        x_i = x[1:]

        ou_like = pm.Categorical.dist(P, shape=(N_chain-1,2)).logp(x_i)
        return pm.Categorical.dist(PA).logp(x[0]) + tt.sum(ou_like)

my HMMStates cannot really call the pm.Categorical super init, so I am calling the super class of pm.Categorical which is pm.Discrete. That trick makes it pass the test of BinaryGibbsMetropolis and CategoricalGibbsMetropolis.

If you are interested in implementing 2-state and multiple state HMM, I will implement all these cases here.