I've been trying to superimpose a normal curve over my histogram with ggplot 2.
My formula:
data <- read.csv (path...)
ggplot(data, aes(V2)) +
geom_histogram(alpha=0.3, fill='white', colour='black', binwidth=.04)
I tried several things:
+ stat_function(fun=dnorm)
....didn't change anything
+ stat_density(geom = "line", colour = "red")
...gave me a straight red line on the x-axis.
+ geom_density()
doesn't work for me because I want to keep my frequency values on the y-axis, and want no density values.
Any suggestions?
Thanks in advance for any tips!
Solution found!
+geom_density(aes(y=0.045*..count..), colour="black", adjust=4)
Think I got it:
This code should do it:
Note: I used qplot but you can use the more versatile ggplot.
This is an extended comment on JWilliman's answer. I found J's answer very useful. While playing around I discovered a way to simplify the code. I'm not saying it is a better way, but I thought I would mention it.
Note that JWilliman's answer provides the count on the y-axis and a "hack" to scale the corresponding density normal approximation (which otherwise would cover a total area of 1 and have therefore a much lower peak).
Main point of this comment: simpler syntax inside
stat_function
, by passing the needed parameters to the aesthetics function, e.g.aes(x = x, mean = 0, sd = 1, binwidth = 0.3, n = 1000)
This avoids having to pass
args =
tostat_function
and is therefore more user-friendly. Okay, it's not very different, but hopefully someone will find it interesting.This has been answered here and partially here.
If you want the y-axis to have frequency counts, then the normal curve needs to be scaled according to the number of observations and the binwidth.
EDIT
Or, for a more flexible approach that allows for use of facets and draws upon an approach listed here, create a separate dataset containing the data for the normal curves and overlay these.