I'm trying to understand if there is any meaningful difference in the ways of passing data into a model - either aggregated or as single trials (note this will only be a sensical question for certain distributions e.g. Binomial).
Predicting p for a yes/no trail, using a simple model with a Binomial distribution.
What is the difference in the computation/results of the following models (if any)?
I choose the two extremes, either passing in a single trail at once (reducing to Bernoulli) or passing in the sum of the entire series of trails, to exemplify my meaning though I am interested in the difference in between these extremes also.
# set up constants
p_true = 0.1
N = 3000
observed = scipy.stats.bernoulli.rvs(p_true, size=N)
Model 1: combining all observations into a single data point
with pm.Model() as binomial_model1:
p = pm.Uniform('p', lower=0, upper=1)
observations = pm.Binomial('observations', N, p, observed=np.sum(observed))
trace1 = pm.sample(40000)
Model 2: using each observation individually
with pm.Model() as binomial_model2:
p = pm.Uniform('p', lower=0, upper=1)
observations = pm.Binomial('observations', 1, p, observed=observed)
trace2 = pm.sample(40000)
There is isn't any noticeable difference in the trace or posteriors in this case. I attempted to dig into the pymc3 source code to try to see how the observations were being processed but couldn't find the right part.
Possible expected answers:
- pymc3 aggregates the observations under the hood for Binomial anyway so their is no difference
- the resultant posterior surface (which is explored in the sample process) is identical in each case -> there is no meaningful/statistical difference in the two models
- there are differences in the resultant statistics because of this and that...
This is an interesting example! Your second suggestion is correct: you can actually work out the posterior analytically, and it will be distributed according to
in either case.
The difference in modelling approach would show up if you used, for example,
pm.sample_ppc
, in that the first would be distributed according toBinomial(N, p)
and the second would beN
draws ofBinomial(1, p)
.