I want a nice density (that sums to 1) histogram for some discrete data. I have tried a couple of ways to do this, but none were entirely satisfactory.
Generate some data:
#data
set.seed(-999)
d.test = data.frame(score = round(rnorm(100,1)))
mean.score = mean(d.test[,1])
d1 = as.data.frame(prop.table(table(d.test)))
The first gives the right placement of bars -- centered on top of the number -- but the wrong placement of vline()
. This is because the x-axis is discrete (factor) and so the mean is plotted using the number of levels, not the values. The mean value is .89.
ggplot(data=d1, aes(x=d.test, y=Freq)) +
geom_bar(stat="identity", width=.5) +
geom_vline(xintercept=mean.score, color="blue", linetype="dashed")
The second gives the correct vline()
placement (because the x-axis is continuous), but wrong placement of bars and the width
parameter does not appear to be modifiable when x-axis is continuous (see here). I also tried the size
parameter which also has no effect. Ditto for hjust
.
ggplot(d.test, aes(x=score)) +
geom_histogram(aes(y=..count../sum(..count..)), width=.5) +
geom_vline(xintercept=mean.score, color="blue", linetype="dashed")
Any ideas? My bad idea is to rescale the mean so that it fits with the factor levels and use the first solution. This won't work well in case some of the factor levels are 'missing', e.g. 1, 2, 4 with no factor for 3 because no datapoint had that value. If the mean is 3.5, rescaling this is odd (x-axis is no longer an interval scale).
Another idea is this:
ggplot(d.test, aes(x=score)) +
stat_bin(binwidth=.5, aes(y= ..density../sum(..density..)), hjust=-.5) +
scale_x_continuous(breaks = -2:5) + #add ticks back
geom_vline(xintercept=mean.score, color="blue", linetype="dashed")
But this requires adjusting the breaks, and the bars are still in the wrong positions (not centered). Unfortunately, hjust
does not appear to work.
How do I get everything I want?
- density sums to 1
- bars centered above values
vline()
at the correct number- width=.5
With base graphics, one could perhaps solve this problem by plotting twice on the x-axis. Is there some similar way here?