sample integer values with specified mean

2019-07-21 15:22发布

问题:

I want to generate a sample of integer numbers in R with a specified mean. I used mu+sd*scale(rnorm(n)) to generate a sample of n values that has exactly the mean=mu but this generates floating-point values; I would like to generate integer values instead. For example, I would like to generate a sample of mean=4. My sample size n=5, an example of generated values would be {2,6,4,3,5}. Any ideas on how to do this in R while satisfying the constraint of a specific value of the mean?

回答1:

Picking n values with a mean of m is equivalent to picking n values that sum to m*n. (I'm assuming you're going to stick to positive integers -- otherwise things get much harder!) Here is a solution based on sampling partitions (sets of values that add up to the desired total) uniformly, but I'm not sure it's what you want, since it doesn't sample uniformly over values, but over partitions ... perhaps someone else can do better, or figure out how to reweight the samples.

This brute-force solution will also probably fail for cases much larger than your example (there are 627 partitions for a total of 20, 5604 for a total of 30, 37338 for a total of 40 ...)

m <- 4
n <- 5
library("partitions")    
pp <- parts(m*n) ## all sets of integers that sum to m*n (=20 here)
## restrict to partitions with exactly n (=5) non-zero values.
pp5 <- pp[1:5,colSums(pp>0)==n]
set.seed(101) ## for reproducibility
## sample uniformly from this set
pp5[,sample(ncol(pp5),size=1)]  ## 9, 5, 4, 1, 1


标签: r random mean