goodness of fit in pymc and plotting discrepancies

2019-03-04 00:56发布

问题:

I'm using PYMC 2.3.4. I found terrific. Now I would like to do some goodness of the fit and plot discrpancies how shown in section 7.3 of the documentation (https://pymc-devs.github.io/pymc/modelchecking.html) In the documentation they say that you need 3 inputs for the discrepancy plot

  1. x: the data
  2. x_sim: the posterior distribution sample
  3. x_exp:expected value I can understand the first two but not the third

This the code

Sero=[0,1,4,2,2,7,13,17,90]

Pop=[ 15,145,170,132,107,57,68,57,251]

for i in range(len(Pop)):
   prob[i] = pymc.Uniform(`prob_%i' % i, 0,1.0)

serobservation=pymc.Binomial('serobservation',n=Pop,p=prob,value=Sero,observed=True)
pobservation=pymc.Binomial('pobservation',n=Pop,p=prob)
mod=pymc.Model([serobservation,pobservation,prob])
mc=pymc.MCMC(mod)
mc.sample(20000)

Everything works fine and then I try to plot discrepancy but I don't Know what to put as expected values Can you help on that? till now I've done in this way:

D = pymc.discrepancy( Sero,pobservation,serobservation)
pymc.Matplot.discrepancy_plot(D, name='D', report_p=True)

but I have the error

AttributeError: 'Binomial' object has no attribute 'trace'

What can I do? Can you please provide me of an example for dummy of how create the expected values? Moreover when I use the function

pymc.Matplot.gof_plot(pobservation,Sero )

only a plot for the last entries of the array is done How can I have a plot for each entry?

Thank you for all your help

回答1:

If you are using the built-in stochastics, there are expval functions that is the expected value for that distribution (in the case of binomial, the function is just binomial_expval, which is just p*n).

In general, I recommend using the gof_plot to produce posterior predictive plots, rather than discrepancy_plot. There are some issues with your code:

  1. Its not clear why you are building a list of prob nodes, rather than just specifying a vector-valued Uniform:

    prob = pymc.Uniform('prob', 0, 1, size=len(Pop))
    
  2. You never need to instantiate Model directly; Just MCMC:

    mc=pymc.MCMC([serobservation,pobservation,prob])
    

With these changes, I get 8 GOF plots, one for each datum. Here is a zip file containing the plots that I got.