R ggplot on-the-fly calculation by grouping variab

I have often wondered if you can get ggplot to do on-the-fly calculations by the facet groups of the plot in a similar way that they would be done using dplyr::group_by. So in the example below is it possible to calculate the cumsum for each different category, rather than the overall cumsum without altering df first?

library(ggplot2)

df <- data.frame(X = rep(1:20,2), Y = runif(40), category = rep(c("A","B"), each = 20))

ggplot(df, aes(x = X, y = cumsum(Y), colour = category))+geom_line()

I can obviously do an easy workaround using dplyr, however as I do this frequently I was keen to know if there is a way to prevent having to specify the grouping variables multiple times (here in group_by and aes(colour = …).

Working alternative, but not what I'm asking for in this case

library(dplyr)
library(ggplot2)

df %>% group_by(category) %>% mutate(Ysum = cumsum(Y)) %>% 
  ggplot(aes(x = X, y = Ysum, colour = category))+geom_line()

Edit: (To answer in response to the @42- comment) I am mainly asking out of curiosity if this is possible, not because the alternative doesn't work. I also think it would be neater in my code if I am making a number of plots which are summing (or other similar calculations) different variables based on different columns or in different datasets, rather than continuously having to group, mutate then plot. I could write a function to do it for me but I thought it might be inbuilt functionality that I missing (the ggplot help doesn't go into the real details).

标签： r ggplot2

1条回答

ゆ、 Hurt°

2楼-- · 2019-07-18 15:52

I have added stat_apply_group() and stat_apply_panel() to the development version of my package 'ggpmisc'. It will take some time before this update makes it to CRAN as the previous update has just been accepted.

For the time being 'ggpmisc' should be installed from Bitbucket for the new stats to be available.

devtools::install_bitbucket("aphalo/ggpmisc", ref = "no-debug")

Then this solves the question:

library(ggplot2)
library(ggpmisc)
set.seed(123456)
df <- data.frame(X = rep(1:20,2),
                 Y = runif(40),
                 category = rep(c("A","B"), each = 20))
ggplot(df, aes(x = X, y = Y, colour = category)) +
  stat_apply_group(.fun.y = cumsum)

Applying cumsum() within the ggplot code instead of using a 'dplyr' "pipe" as in the second example saves us from having to specify the grouping twice.

0人赞添加讨论(0) 举报

R ggplot on-the-fly calculation by grouping variab

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间