ggplot2 density histogram with width=.5, vline and

2019-05-31 15:15发布

问题:

I want a nice density (that sums to 1) histogram for some discrete data. I have tried a couple of ways to do this, but none were entirely satisfactory.

Generate some data:

#data
set.seed(-999)
d.test = data.frame(score = round(rnorm(100,1)))
mean.score = mean(d.test[,1])
d1 = as.data.frame(prop.table(table(d.test)))

The first gives the right placement of bars -- centered on top of the number -- but the wrong placement of vline(). This is because the x-axis is discrete (factor) and so the mean is plotted using the number of levels, not the values. The mean value is .89.

ggplot(data=d1, aes(x=d.test, y=Freq)) +
  geom_bar(stat="identity", width=.5) +
  geom_vline(xintercept=mean.score, color="blue", linetype="dashed")

The second gives the correct vline() placement (because the x-axis is continuous), but wrong placement of bars and the width parameter does not appear to be modifiable when x-axis is continuous (see here). I also tried the size parameter which also has no effect. Ditto for hjust.

ggplot(d.test, aes(x=score)) +
  geom_histogram(aes(y=..count../sum(..count..)), width=.5) +
  geom_vline(xintercept=mean.score, color="blue", linetype="dashed")

Any ideas? My bad idea is to rescale the mean so that it fits with the factor levels and use the first solution. This won't work well in case some of the factor levels are 'missing', e.g. 1, 2, 4 with no factor for 3 because no datapoint had that value. If the mean is 3.5, rescaling this is odd (x-axis is no longer an interval scale).

Another idea is this:

ggplot(d.test, aes(x=score)) +
  stat_bin(binwidth=.5, aes(y= ..density../sum(..density..)), hjust=-.5) +
  scale_x_continuous(breaks = -2:5) + #add ticks back
  geom_vline(xintercept=mean.score, color="blue", linetype="dashed")

But this requires adjusting the breaks, and the bars are still in the wrong positions (not centered). Unfortunately, hjust does not appear to work.

How do I get everything I want?

  • density sums to 1
  • bars centered above values
  • vline() at the correct number
  • width=.5

With base graphics, one could perhaps solve this problem by plotting twice on the x-axis. Is there some similar way here?

回答1:

It sounds like you just want to make sure that your x-axis values are numeric rather than factors

ggplot(data=d1, aes(x=as.numeric(as.character(d.test)), y=Freq)) +
  geom_bar(stat="identity", width=.5) +
  geom_vline(xintercept=mean.score, color="blue", linetype="dashed") + 
  scale_x_continuous(breaks=-2:3)

which gives