I'm trying to create (in r) the equivalent to the following MATLAB function that will generate n samples from a mixture of N(m1,(s1)^2) and N(m2, (s2)^2) with a fraction, alpha, from the first Gaussian.
I have a start, but the results are notably different between MATLAB and R (i.e., the MATLAB results give occasional values of +-8 but the R version never even gives a value of +-5). Please help me sort out what is wrong here. Thanks :-)
For Example:
Plot 1000 samples from a mix of N(0,1) and N(0,36) with 95% of samples from the first Gaussian. Normalize the samples to mean zero and standard deviation one.
MATLAB
function
function y = gaussmix(n,m1,m2,s1,s2,alpha)
y = zeros(n,1);
U = rand(n,1);
I = (U < alpha)
y = I.*(randn(n,1)*s1+m1) + (1-I).*(randn(n,1)*s2 + m2);
implementation
P = gaussmix(1000,0,0,1,6,.95)
P = (P-mean(P))/std(P)
plot(P)
axis([0 1000 -15 15])
hist(P)
axis([-15 15 0 1000])
resulting plot
![](https://www.manongdao.com/static/images/pcload.jpg)
resulting hist
![](https://www.manongdao.com/static/images/pcload.jpg)
R
yn <- rbinom(1000, 1, .95)
s <- rnorm(1000, 0 + 0*yn, 1 + 36*yn)
sn <- (s-mean(s))/sd(s)
plot(sn, xlim=range(0,1000), ylim=range(-15,15))
hist(sn, xlim=range(-15,15), ylim=range(0,1000))
resulting plot
![](https://www.manongdao.com/static/images/pcload.jpg)
resulting hist
![](https://www.manongdao.com/static/images/pcload.jpg)
As always, THANK YOU!
SOLUTION
gaussmix <- function(nsim,mean_1,mean_2,std_1,std_2,alpha){
U <- runif(nsim)
I <- as.numeric(U<alpha)
y <- I*rnorm(nsim,mean=mean_1,sd=std_1)+
(1-I)*rnorm(nsim,mean=mean_2,sd=std_2)
return(y)
}
z1 <- gaussmix(1000,0,0,1,6,0.95)
z1_standardized <- (z1-mean(z1))/sqrt(var(z1))
z2 <- gaussmix(1000,0,3,1,1,0.80)
z2_standardized <- (z2-mean(z2))/sqrt(var(z2))
z3 <- rlnorm(1000)
z3_standardized <- (z3-mean(z3))/sqrt(var(z3))
par(mfrow=c(2,3))
hist(z1_standardized,xlim=c(-10,10),ylim=c(0,500),
main="Histogram of 95% of N(0,1) and 5% of N(0,36)",
col="blue",xlab=" ")
hist(z2_standardized,xlim=c(-10,10),ylim=c(0,500),
main="Histogram of 80% of N(0,1) and 10% of N(3,1)",
col="blue",xlab=" ")
hist(z3_standardized,xlim=c(-10,10),ylim=c(0,500),
main="Histogram of samples of LN(0,1)",col="blue",xlab=" ")
##
plot(z1_standardized,type='l',
main="1000 samples from a mixture N(0,1) and N(0,36)",
col="blue",xlab="Samples",ylab="Mean",ylim=c(-10,10))
plot(z2_standardized,type='l',
main="1000 samples from a mixture N(0,1) and N(3,1)",
col="blue",xlab="Samples",ylab="Mean",ylim=c(-10,10))
plot(z3_standardized,type='l',
main="1000 samples from LN(0,1)",
col="blue",xlab="Samples",ylab="Mean",ylim=c(-10,10))
There are two problems, I think ... (1) your R code is creating a mixture of normal distributions with standard deviations of 1 and 37. (2) By setting prob
equal to alpha in your rbinom()
call, you're getting a fraction alpha in the second mode rather than the first. So what you are getting is a distribution that is mostly a Gaussian with sd 37, contaminated by a 5% mixture of Gaussian with sd 1, rather than a Gaussian with sd 1 that is contaminated by a 5% mixture of a Gaussian with sd 6. Scaling by the standard deviation of the mixture (which is about 36.6) basically reduces it to a standard Gaussian with a slight bump near the origin ...
(The other answers posted here do solve your problem perfectly well, but I thought you might be interested in a diagnosis ...)
A more compact (and perhaps more idiomatic) version of your Matlab gaussmix
function (I think runif(n)<alpha
is slightly more efficient than rbinom(n,size=1,prob=alpha)
)
gaussmix <- function(n,m1,m2,s1,s2,alpha) {
I <- runif(n)<alpha
rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2))
}
set.seed(1001)
s <- gaussmix(1000,0,0,1,6,0.95)
Not that you asked for it, but the mclust
package offers a way to generalize your problem to more dimensions and diverse covariance structures. See ?mclust::sim
. The example task would be done this way:
require(mclust)
simdata = sim(modelName = "V",
parameters = list(pro = c(0.95, 0.05),
mean = c(0, 0),
variance = list(modelName = "V",
d = 1,
G = 2,
sigmasq = c(0, 36))),
n = 1000)
plot(scale(simdata[,2]), type = "h")
I recently wrote the density and sampling function of a multinomial mixture of normal distributions:
dmultiNorm <- function(x,means,sds,weights)
{
if (length(means)!=length(sds)) stop("Length of means must be equal to length of standard deviations")
N <- length(x)
n <- length(means)
if (missing(weights))
{
weights <- rep(1,n)
}
if (length(weights)!=n) stop ("Length of weights not equal to length of means and sds")
weights <- weights/sum(weights)
dens <- numeric(N)
for (i in 1:n)
{
dens <- dens + weights[i] * dnorm(x,means[i],sds[i])
}
return(dens)
}
rmultiNorm <- function(N,means,sds,weights,scale=TRUE)
{
if (length(means)!=length(sds)) stop("Length of means must be equal to length of standard deviations")
n <- length(means)
if (missing(weights))
{
weights <- rep(1,n)
}
if (length(weights)!=n) stop ("Length of weights not equal to length of means and sds")
Res <- numeric(N)
for (i in 1:N)
{
s <- sample(1:n,1,prob=weights)
Res[i] <- rnorm(1,means[s],sds[s])
}
return(Res)
}
With means
being a vector of means, sds
being a vector of standard deviatians and weights
being a vector with proportional probabilities to sample from each of the distributions. Is this useful to you?
Here is code to do this task:
"For Example: Plot 1000 samples from a mix of N(0,1) and N(0,36) with 95% of samples from the first Gaussian. Normalize the samples to mean zero and standard deviation one."
plot(multG <- c( rnorm(950), rnorm(50, 0, 36))[sample(1000)] , type="h")
scmulG <- scale(multG)
summary(scmulG)
#-----------
V1
Min. :-9.01845
1st Qu.:-0.06544
Median : 0.03841
Mean : 0.00000
3rd Qu.: 0.13940
Max. :12.33107
![](https://www.manongdao.com/static/images/pcload.jpg)