I'm trying to superimpose a mixed distribution plot with a plot of identified component distributions, using ggplot2
package and a user-defined function for its stat_function()
. I have tried two approaches. The distribution identification is normal in both cases:
number of iterations= 11
summary of normalmixEM object:
comp 1 comp 2
lambda 0.348900 0.65110
mu 2.019878 4.27454
sigma 0.237472 0.43542
loglik at estimate: -276.3643
A) However, in the first approach, the output contains the following error:
Error in eval(expr, envir, enclos) : object 'comp.number' not found
The reproducible example for this approach follows (faithful is a built-in R
dataset):
library(ggplot2)
library(mixtools)
DISTRIB_COLORS <- c("green", "red")
NUM_COMPONENTS <- 2
set.seed(12345)
mix.info <- normalmixEM(faithful$eruptions, k = NUM_COMPONENTS,
maxit = 100, epsilon = 0.01)
summary(mix.info)
plot.components <- function(mix, comp.number) {
g <- stat_function(fun = function(mix, comp.number)
{mix$lambda[comp.number] *
dnorm(x, mean = mix$mu[comp.number],
sd = mix$sigma[comp.number])},
geom = "line", aes(colour = DISTRIB_COLORS[comp.number]))
return (g)
}
g <- ggplot(faithful, aes(x = waiting)) +
geom_histogram(binwidth = 0.5)
distComps <- lapply(seq(NUM_COMPONENTS),
function(i) plot.components(mix.info, i))
print(g + distComps)
B) The second approach doesn't produce any errors. However, the only plot visible is the one of the mixed distribution. Plots of its component distributions are not produced or visible (with some degree of confidence it seems to me that the a straight horizontal line y=0 is also visible, but I'm not 100% sure):
The following is a reproducible example for this approach:
library(ggplot2)
library(mixtools)
DISTRIB_COLORS <- c("green", "red")
NUM_COMPONENTS <- 2
set.seed(12345)
mix.info <- normalmixEM(faithful$eruptions, k = NUM_COMPONENTS,
maxit = 100, epsilon = 0.01)
summary(mix.info)
plot.components <- function(x, mix, comp.number, ...) {
mix$lambda[comp.number] *
dnorm(x, mean = mix$mu[comp.number],
sd = mix$sigma[comp.number], ...)
}
g <- ggplot(faithful, aes(x = waiting)) +
geom_histogram(binwidth = 0.5)
distComps <- lapply(seq(NUM_COMPONENTS), function(i)
stat_function(fun = plot.components,
args = list(mix = mix.info, comp.number = i)))
print(g + distComps)
Question: What are the problems in each of the approaches and which one is (more) correct?
UPDATE: Just minutes after posting, I realized that I forgot to include the line-drawing part of the stat_function()
for the second approach, so that the corresponding lines are as follow:
distComps <- lapply(seq(NUM_COMPONENTS), function(i)
stat_function(fun = plot.components,
args = list(mix = mix.info, comp.number = i)),
geom = "line", aes(colour = DISTRIB_COLORS[i]))
However, this update produces an error, source of which I don't quite understand:
Error in FUN(1:2[[1L]], ...) :
unused arguments (geom = "line", list(colour = DISTRIB_COLORS[i]))