I am trying to fit a Negative binomial mixture with PyMC. It seems I do something wrong, because the predictive doesn't look at all similar to the input data. The problem is probably in the prior of the Negative binomial parameters. Any suggestions?
from sklearn.cluster import KMeans
import pymc as mc
n = 3 #Number of components of the mixture
ndata = len(data)
dd = mc.Dirichlet('dd', theta=(1,)*n)
category = mc.Categorical('category', p=dd, size=ndata)
kme = KMeans(n) # This is not needed but it is to help convergence
kme.fit(data[:,newaxis])
alphas = mc.TruncatedNormal('alphas', kme.cluster_centers_[:,0], 0.1, a=0. ,b=100000 ,size=n)
means = mc.TruncatedNormal('means', kme.cluster_centers_[:,0],0.1,a=0.0 ,b=100000, size=n)
@mc.deterministic
def mean(category=category, means=means):
return means[category]
@mc.deterministic
def alpha(category=category, alphas=alphas):
return alphas[category]
obs = mc.NegativeBinomial('obs', mean, alpha, value=data, observed = True)
predictive = mc.NegativeBinomial('predictive', mean, alpha)
model = mc.Model({'dd': dd,
'category': category,
'alphas': alphas,
'means': means,
'predictive':predictive,
'obs': obs})
mcmc = mc.MCMC( model )
mcmc.sample( iter=n_samples, burn=int(n_samples*0.7))
You have correctly implemented a Bayesian estimation of a mixture of three distributions, but the MCMC model gives wrong-looking values.
The problem is that
category
is not converging quickly enough, and the parameters inmeans
,alphas
, anddd
run away from the good values beforecategory
decides which points belong to which distribution.You can see that the posterior for
category
is wrong by visualizing it:Expectation-maximization is the classic approach to stabilize the latent variables, but you can also use the results of the quick-and-dirty k-means fit to provide initial values for the MCMC:
Then the estimates converge to reasonable-looking values.
For your prior on alpha, you can just use the same distribution for all of them:
This problem is not specific to the negative binomial distribution; Dirichlet-mixtures of normal distributions fail in the same way; it results from having a high-dimensional categorical distribution that MCMC is not efficient at optimizing.