I am trying to find a way to generate correlated random numbers from several binomial distributions.
I know how to do it with normal distributions (using mvrnorm), but I did not find a function applicable to binomial ones.
I am trying to find a way to generate correlated random numbers from several binomial distributions.
I know how to do it with normal distributions (using mvrnorm), but I did not find a function applicable to binomial ones.
You can generate correlated uniforms using the copula
package, then use the qbinom
function to convert those to binomial variables. Here is one quick example:
library(copula)
tmp <- normalCopula( 0.75, dim=2 )
x <- rcopula(tmp, 1000)
x2 <- cbind( qbinom(x[,1], 10, 0.5), qbinom(x[,2], 15, 0.7) )
Now x2
is a matrix with the 2 columns representing 2 binomial variables that are correlated.
A binomial variable with n trials and probability p of success in each trial can be viewed as the sum of n Bernoulli trials each also having probability p of success.
Similarly, you can construct pairs of correlated binomial variates by summing up pairs of Bernoulli variates having the desired correlation r.
require(bindata)
# Parameters of joint distribution
size <- 20
p1 <- 0.5
p2 <- 0.3
rho<- 0.2
# Create one pair of correlated binomial values
trials <- rmvbin(size, c(p1,p2), bincorr=(1-rho)*diag(2)+rho)
colSums(trials)
# A function to create n correlated pairs
rmvBinomial <- function(n, size, p1, p2, rho) {
X <- replicate(n, {
colSums(rmvbin(size, c(p1,p2), bincorr=(1-rho)*diag(2)+rho))
})
t(X)
}
# Try it out, creating 1000 pairs
X <- rmvBinomial(1000, size=size, p1=p1, p2=p2, rho=rho)
# cor(X[,1], X[,2])
# [1] 0.1935928 # (In ~8 trials, sample correlations ranged between 0.15 & 0.25)
It's important to note that there are many different joint distributions that share the desired correlation coefficient. The simulation method in rmvBinomial()
produces one of them, but whether or not it's the appropriate one will depend on the process that's generating you data.
As noted in this R-help answer to a similar question (which then goes on to explain the idea in more detail) :
while a bivariate normal (given means and variances) is uniquely defined by the correlation coefficient, this is not the case for a bivariate binomial