I have a sample of 1m records obtained from my original data. (For your reference, you may use this dummy data that may generate approximately similar distribution
b <- data.frame(matrix(rnorm(2000000, mean=c(8,17), sd=2)))
c <- b[sample(nrow(b), 1000000), ]
) I believed the histogram to be a mixture of two log-normal distributions and I tried to fit the summed distributions using EM algorithm using the following code:
install.packages("mixtools")
lib(mixtools)
#line below returns EM output of type mixEM[] for mixture of normal distributions
c1 <- normalmixEM(c, lambda=NULL, mu=NULL, sigma=NULL)
plot(c1, density=TRUE)
The first plot is a log-likelihood plot and the second (if you hit return again), gives similar to the following density curves:
As I mentioned c1 is of type mixEM[] and plot() function can accommodate that. I want to fill the density curves with colors. This is easy to do using ggplot2() but ggplot2() does not support data of type mixEM[] and throws this message:
"ggplot doesn't know how to deal with data of class mixEM" Is there any other approach I can take for this problem? Any suggestions are greatly appreciated!!
Thanks!
Look at the structure of the returned object (this should be documented in the help):
Now what:
The lambda, mu, and sigma components define the returned normal densities. You can plot these in ggplot using
qplot
andstat_function
. But first make a function that returns scaled normal densities:Then:
Or whatever
ggplot
skills you have. Transparent colours on the densities might be nice.producing:
Here's a slightly different approach which uses
geom_ploygon(...)
instead of multiple calls tostat_function(...)
. One problem withstat_function(...)
is that the secondary arguments (mu, sigma, and lambda in this example), which are passed using theargs=list(...)
parameter, cannot be included in an aesthetic mapping, so you have to have multiple calls tostat_function(...)
as is @Spacedman`s solution.This approach builds the PDFs outside of ggplot and uses a single call to
geom_polygon(...)
. As a result, it works without modification for an arbitrary number of distributions in the mixture.