I want to add annotations such as n=5
, n=4
with the number of data points in each boxplot at the top edge of my geom_boxplot
plot.
I am aware I can do this with geom_text
by precomputing the counts,
but it seems that ggplot2
, having all these wonderful binning and summarizing
functionality, ought to be able to do this itself?
Let's assume we have these data:
library(tidyverse)
dd = tribble(
~val, ~kind,
1, 'A',
3, 'A',
5, 'A',
5, 'A',
6, 'A',
3, 'B',
4, 'B',
4, 'B',
5, 'B'
)
I have tried this:
> base = ggplot(dd, aes(x=kind, y=val)) + geom_boxplot()
> base + geom_text(y=6, label=..count.., stat='count')
Error in layer(data = data, mapping = mapping, stat = stat, geom = GeomText, :
object '..count..' not found
Presumably, geom_text
has simply ignored my stat
parameter?
Next, I tried this:
> base + stat_count(aes(y=6, label=..count..), geom='text')
Error: stat_count() must not be used with a y aesthetic.
Shouldn't it be my own problem whether I can do anything useful with
the resulting ..count..
, "y aesthetic" or not?
Both of these attempts appear sensible to me.
Can anybody explain conceptually why ggplot2 does not accept these commands?
And whether there is any approach with ggplot2-supplied counting that will work?
This is a design limitation of ggplot2. If Hadley rewrote it now he'd probably implement it differently. Conceptually, you'd want to have two separate mappings, one for the stat and one for the geom. However, ggplot2 doesn't work that way. It only has one set of mappings that for the most part is applied to both the stat and the geom. There's a bit of a workaround in that you can use
..variable..
to refer in the geom to variables calculated in the stat, but the mappings are still all thrown together.There is no functionality currently that allows you to specify that the
y
aesthetic is only meant forgeom_text
and thatstat_count
should ignore it.Another scenario where this comes up all the time is vertical or horizontal versions of stats that otherwise are horizontal or vertical. There's an entire package for that, ggstance. Conceptually, this doesn't make much sense. Why can't I calculate a density using
stat_density()
, and then map the "x" variable of the density curve (i.e., the variable the density is calculated over) to they
aesthetic and the "y" variable (i.e., the height of the density) to thex
aesthetic. Instead, I need to usestat_xdensity()
which is identical tostat_density()
except it swaps x and y.I've been thinking that it might be possible to extend ggplot2 without breaking it by adding a separate
layer()
-type function that takes two aesthetics arguments, one for the stat and one for the geom. I.e., something like:(This would draw a vertical density line, similar to the outline of half a violin plot.)
One other non-intuitive limitation we often run into is that calculations in the
aes
transformations don't respect the data grouping. For example, let's say we want to mark the median line of boxplots with a red dot. We might try:This is the result:
The median is calculated over the entire data column, not separately by species.