I have often wondered if you can get ggplot
to do on-the-fly calculations by the facet groups of the plot in a similar way that they would be done using dplyr::group_by
. So in the example below is it possible to calculate the cumsum for each different category, rather than the overall cumsum without altering df
first?
library(ggplot2)
df <- data.frame(X = rep(1:20,2), Y = runif(40), category = rep(c("A","B"), each = 20))
ggplot(df, aes(x = X, y = cumsum(Y), colour = category))+geom_line()
I can obviously do an easy workaround using dplyr
, however as I do this frequently I was keen to know if there is a way to prevent having to specify the grouping variables multiple times (here in group_by
and aes(colour = …)
.
Working alternative, but not what I'm asking for in this case
library(dplyr)
library(ggplot2)
df %>% group_by(category) %>% mutate(Ysum = cumsum(Y)) %>%
ggplot(aes(x = X, y = Ysum, colour = category))+geom_line()
Edit: (To answer in response to the @42- comment) I am mainly asking out of curiosity if this is possible, not because the alternative doesn't work. I also think it would be neater in my code if I am making a number of plots which are summing (or other similar calculations) different variables based on different columns or in different datasets, rather than continuously having to group, mutate then plot. I could write a function to do it for me but I thought it might be inbuilt functionality that I missing (the ggplot help doesn't go into the real details).
I have added
stat_apply_group()
andstat_apply_panel()
to the development version of my package 'ggpmisc'. It will take some time before this update makes it to CRAN as the previous update has just been accepted.For the time being 'ggpmisc' should be installed from Bitbucket for the new stats to be available.
Then this solves the question:
Applying
cumsum()
within the ggplot code instead of using a 'dplyr' "pipe" as in the second example saves us from having to specify the grouping twice.