How to generate a binomial vector of n correlated

2019-07-09 07:16发布

问题:

I want to generate a binomial vector based on a number of correlated items each with a defined probability. When I use e. g. rbinom(1e3, size = 4, prob = c(p.x1, p.x2, p.x3, p.x4)) I'm getting something like 3 3 0 0 2 4 1 0 4 4 0 1 4.... Now these x_i have adefined probabilities but are not correlated.

Five years ago Josh O'Brien contributed a great approach to generate correlated binomial data. I think it is close towards my needs, but it is designed for pairs, and I want 1., a single vector and 2., more variables p. I already tried to modify the function to produce a greater number of variables but with no success so far and I'm frequently facing

Error in commonprob2sigma(commonprob, simulvals) : 
Matrix commonprob not admissible. 

which is sent by the imported bindata package.

My idea is to define in Josh's function four (or better an arbitrary number of) probabilities and rhos, something like

rmvBinomial3 <- function(n, size, p1, p2, p3, p4, rho) {
  X <- replicate(n, {
    colSums(rmvbin(size, c(p1,p2,p3,p4), bincorr=(1-rho)*diag(4)+rho))
  })
  t(X)
}

Sure--more rhos are needed and I guess a probabililty matrix should be included somehow as it can be done with the bindata package. I do not know how to include it.

rho1 <- -0.89; rho2 <- -0.75; rho3 <- -0.62; rho4 <- -0.59
m <- matrix(c(1, rho1, rho2, rho3,
     rho1, 1, rho4, rho2,
     rho2, rho4, 1, rho1,
     rho3, rho2, rho1, 1), ncol = 4) 
#       [,1]  [,2]  [,3]  [,4]
# [1,]  1.00 -0.89 -0.75 -0.62
# [2,] -0.89  1.00 -0.59 -0.75
# [3,] -0.75 -0.59  1.00 -0.89
# [4,] -0.62 -0.75 -0.89  1.00

Unfortunately each matrix, in order to fit the conditions for bindata I check with bindata::check.commonprob(m), throws me the same error as above. I also couldn't accomplish to get a matrix created by bindata::commonprob2sigma().

An other drawback for me is the range of Josh's rmvBinomial(), it seems to work only between values for p.X_i= 0.2--0.8 something and I need smaller values e.g. 0.01--0.1, too.

Any help is greatly apprecciated.

Edit: To clarify, the expected outcome is indeed just one single vector 3 3 0 0 2 4 1 0 4 4 0 1 4... as shown in the beginning, but the items from which it's derived should be correlated to a definable degree (i. e. one of the items could have no correlation at all).