Negative Binomial Mixture in PyMC

I am trying to fit a Negative binomial mixture with PyMC. It seems I do something wrong, because the predictive doesn't look at all similar to the input data. The problem is probably in the prior of the Negative binomial parameters. Any suggestions?

    from sklearn.cluster import KMeans
    import pymc as mc
    n = 3 #Number of components of the mixture
    ndata = len(data)

    dd = mc.Dirichlet('dd', theta=(1,)*n)
    category = mc.Categorical('category', p=dd, size=ndata)

    kme = KMeans(n) # This is not needed but it is to help convergence
    kme.fit(data[:,newaxis])
    alphas = mc.TruncatedNormal('alphas', kme.cluster_centers_[:,0], 0.1, a=0. ,b=100000 ,size=n)
    means = mc.TruncatedNormal('means', kme.cluster_centers_[:,0],0.1,a=0.0 ,b=100000, size=n)

    @mc.deterministic
    def mean(category=category, means=means):
        return means[category]

    @mc.deterministic
    def alpha(category=category, alphas=alphas):
        return alphas[category]

    obs = mc.NegativeBinomial('obs', mean, alpha, value=data, observed = True)

    predictive = mc.NegativeBinomial('predictive', mean, alpha)

    model = mc.Model({'dd': dd,
                  'category': category,
                  'alphas': alphas,
                  'means': means,
                  'predictive':predictive,
                  'obs': obs})

    mcmc = mc.MCMC( model )
    mcmc.sample( iter=n_samples, burn=int(n_samples*0.7))

标签： python statistics modeling pymc markov-chains

1条回答

SAY GOODBYE

2楼-- · 2019-08-13 03:31

You have correctly implemented a Bayesian estimation of a mixture of three distributions, but the MCMC model gives wrong-looking values.

The problem is that category is not converging quickly enough, and the parameters in means, alphas, and dd run away from the good values before category decides which points belong to which distribution.

data = np.atleast_2d(list(mc.rnegative_binomial(100., 10., size=s)) +
    list(mc.rnegative_binomial(200., 1000., size=s)) +
    list(mc.rnegative_binomial(300., 1000., size=s))).T
nsamples = 10000

You can see that the posterior for category is wrong by visualizing it:

G = [data[np.nonzero(np.round(mcmc.trace("category")[:].mean(axis=0)) == i)]
    for i in range(0,3) ]
plt.hist(G, bins=30, stacked = True)

category posteriors of the input data, no initialization

Expectation-maximization is the classic approach to stabilize the latent variables, but you can also use the results of the quick-and-dirty k-means fit to provide initial values for the MCMC:

category = mc.Categorical('category', p=dd, size=ndata, value=kme.labels_)

Then the estimates converge to reasonable-looking values.

category posteriors using kmeans to initialize

For your prior on alpha, you can just use the same distribution for all of them:

alphas = mc.Gamma('alphas', alpha=1, beta=.0001 ,size=n)

This problem is not specific to the negative binomial distribution; Dirichlet-mixtures of normal distributions fail in the same way; it results from having a high-dimensional categorical distribution that MCMC is not efficient at optimizing.

0人赞添加讨论(0) 举报

Negative Binomial Mixture in PyMC

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间