ggplot2: how to add sample numbers to density plot

2019-05-24 10:47发布

问题:

I am trying to generate a (grouped) density plot labelled with sample sizes.

Sample data:

set.seed(100)
df <- data.frame(ab.class = c(rep("A", 200), rep("B", 200)),
                 val = c(rnorm(200, 0, 1), rnorm(200, 1, 1)))

The unlabelled density plot is generated and looks as follows:

ggplot(df, aes(x = val, group = ab.class)) +
  geom_density(aes(fill = ab.class), alpha = 0.4)

What I want to do is add text labels somewhere near the peak of each density, showing the number of samples in each group. However, I cannot find the right combination of options to summarise the data in this way.

I tried to adapt the code suggested in this answer to a similar question on boxplots: https://stackoverflow.com/a/15720769/1836013

n_fun <- function(x){
  return(data.frame(y = max(x), label = paste0("n = ",length(x))))
}

ggplot(df, aes(x = val, group = ab.class)) +
  geom_density(aes(fill = ab.class), alpha = 0.4) +
  stat_summary(geom = "text", fun.data = n_fun)

However, this fails with Error: stat_summary requires the following missing aesthetics: y.

I also tried adding y = ..density.. within aes() for each of the geom_density() and stat_summary() layers, and in the ggplot() object itself... none of which solved the problem.

I know this could be achieved by manually adding labels for each group, but I was hoping for a solution that generalises, and e.g. allows the label colour to be set via aes() to match the densities.

Where am I going wrong?

回答1:

The y in the return of fun.data is not the aes. stat_summary complains that he cannot find y, which should be specificed in global settings at ggplot(df, aes(x = val, group = ab.class, y = or stat_summary(aes(y = if global setting of y is not available. The fun.data compute where to display point/text/... at each x based on y given in the data through aes. (I am not sure whether I have made this clear. Not a native English speaker).

Even if you have specified y through aes, you won't get desired results because stat_summary compute a y at each x.

However, you can add text to desired positions by geom_text or annotate:

# save the plot as p
p <- ggplot(df, aes(x = val, group = ab.class)) +
    geom_density(aes(fill = ab.class), alpha = 0.4)

# build the data displayed on the plot.
p.data <- ggplot_build(p)$data[[1]]

# Note that column 'scaled' is used for plotting
# so we extract the max density row for each group
p.text <- lapply(split(p.data, f = p.data$group), function(df){
    df[which.max(df$scaled), ]
})
p.text <- do.call(rbind, p.text)  # we can also get p.text with dplyr.

# now add the text layer to the plot
p + annotate('text', x = p.text$x, y = p.text$y,
             label = sprintf('n = %d', p.text$n), vjust = 0)



标签: r ggplot2