ggplot2 geom_boxplot: annotating counts without co

2019-07-23 06:52发布

I want to add annotations such as n=5, n=4 with the number of data points in each boxplot at the top edge of my geom_boxplot plot.

I am aware I can do this with geom_text by precomputing the counts, but it seems that ggplot2, having all these wonderful binning and summarizing functionality, ought to be able to do this itself?

Let's assume we have these data:

library(tidyverse)

dd = tribble(
    ~val, ~kind,
    1,    'A',
    3,    'A',
    5,    'A',
    5,    'A',
    6,    'A',
    3,    'B',
    4,    'B',
    4,    'B',
    5,    'B'
)

I have tried this:

> base = ggplot(dd, aes(x=kind, y=val)) + geom_boxplot()
> base + geom_text(y=6, label=..count.., stat='count')

Error in layer(data = data, mapping = mapping, stat = stat, geom = GeomText,  : 
  object '..count..' not found

Presumably, geom_text has simply ignored my stat parameter?

Next, I tried this:

> base + stat_count(aes(y=6, label=..count..), geom='text')

Error: stat_count() must not be used with a y aesthetic.

Shouldn't it be my own problem whether I can do anything useful with the resulting ..count.., "y aesthetic" or not?

Both of these attempts appear sensible to me.
Can anybody explain conceptually why ggplot2 does not accept these commands?
And whether there is any approach with ggplot2-supplied counting that will work?

标签: r ggplot2
1条回答
劳资没心,怎么记你
2楼-- · 2019-07-23 07:08

This is a design limitation of ggplot2. If Hadley rewrote it now he'd probably implement it differently. Conceptually, you'd want to have two separate mappings, one for the stat and one for the geom. However, ggplot2 doesn't work that way. It only has one set of mappings that for the most part is applied to both the stat and the geom. There's a bit of a workaround in that you can use ..variable.. to refer in the geom to variables calculated in the stat, but the mappings are still all thrown together.

There is no functionality currently that allows you to specify that the y aesthetic is only meant for geom_text and that stat_count should ignore it.

Another scenario where this comes up all the time is vertical or horizontal versions of stats that otherwise are horizontal or vertical. There's an entire package for that, ggstance. Conceptually, this doesn't make much sense. Why can't I calculate a density using stat_density(), and then map the "x" variable of the density curve (i.e., the variable the density is calculated over) to the y aesthetic and the "y" variable (i.e., the height of the density) to the x aesthetic. Instead, I need to use stat_xdensity() which is identical to stat_density() except it swaps x and y.

I've been thinking that it might be possible to extend ggplot2 without breaking it by adding a separate layer()-type function that takes two aesthetics arguments, one for the stat and one for the geom. I.e., something like:

layer2(aes_geom(y = ..x.., x = ..y..),
       aes_stat(x = variable),
       geom = "line", stat = "density")

(This would draw a vertical density line, similar to the outline of half a violin plot.)

One other non-intuitive limitation we often run into is that calculations in the aes transformations don't respect the data grouping. For example, let's say we want to mark the median line of boxplots with a red dot. We might try:

ggplot(iris, aes(x = Species, y = Sepal.Length)) + 
  geom_boxplot() +
  geom_point(aes(x = Species, y = median(Sepal.Length)), size = 3, color = "red")

This is the result:

enter image description here

The median is calculated over the entire data column, not separately by species.

查看更多
登录 后发表回答