Using dplyr summarize with different operations fo

Well, I know that there are already tons of related questions, but none gave an answer to my particular need.

I want to use dplyr "summarize" on a table with 50 columns, and I need to apply different summary functions to these.

"Summarize_all" and "summarize_at" both seem to have the disadvantage that it's not possible to apply different functions to different subgroups of variables.

As an example, let's assume the iris dataset would have 50 columns, so we do not want to address columns by names. I want the sum over the first two columns, the mean over the third and the first value for all remaining columns (after a group_by(Species)). How could I do this?

标签： r dplyr

3条回答

Evening l夕情丶

2楼-- · 2020-02-11 09:26

As other people have mentioned, this is normally done by calling summarize_each / summarize_at / summarize_if for every group of columns that you want to apply the summarizing function to. As far as I know, you would have to create a custom function that performs summarizations to each subset. You can for example set the colnames in such way that you can use the select helpers (e.g. contains()) to filter just the columns that you want to apply the function to. If not, then you can set the specific column numbers that you want to summarize.

For the example you mentioned, you could try the following:

summarizer <- function(tb, colsone, colstwo, colsthree, 
                       funsone, funstwo, funsthree, group_name) {

return(bind_cols(
    summarize_all(select(tb, colsone), .funs = funsone),
    summarize_all(select(tb, colstwo), .funs = funstwo) %>% 
      ungroup() %>% select(-matches(group_name)),
    summarize_all(select(tb, colsthree), .funs = funsthree) %>% 
      ungroup() %>% select(-matches(group_name)) 
))

}

#With colnames
iris %>% as.tibble() %>% 
  group_by(Species) %>% 
  summarizer(colsone = contains("Sepal"), 
         colstwo = matches("Petal.Length"), 
         colsthree = c(-contains("Sepal"), -matches("Petal.Length")),
         funsone = "sum", 
         funstwo = "mean",
         funsthree = "first",
         group_name = "Species")

#With indexes
iris %>% as.tibble() %>% 
 group_by(Species) %>% 
 summarizer(colsone = 1:2, 
         colstwo = 3, 
         colsthree = 4,
         funsone = "sum", 
         funstwo = "mean",
         funsthree = "first",
         group_name = "Species")

0人赞添加讨论(0) 举报

Anthone

3楼-- · 2020-02-11 09:26

You could summarise the data with each function separately and then join the data later if needed.

So something like this for the iris example:

sums <- iris %>% group_by(Species) %>% summarise_at(1:2, sum)
means <- iris %>% group_by(Species) %>% summarise_at(3, mean)
firsts <- iris %>% group_by(Species) %>% summarise_at(4, first)
full_join(sums, means) %>% full_join(firsts)

Though I would try to think of something else if there are more than a handful of summarising functions you need to use.

0人赞添加讨论(0) 举报

▲ chillily

4楼-- · 2020-02-11 09:35

Try this:

library(plyr)
library(dplyr)

dataframe <- data.frame(var = c(1,1,1,2,2,2),var2 = c(10,9,8,7,6,5),var3=c(2,3,4,5,6,7),var4=c(5,5,3,2,4,2))
dataframe

#  var var2 var3 var4
#1   1   10    2    5
#2   1    9    3    5
#3   1    8    4    3
#4   2    7    5    2
#5   2    6    6    4
#6   2    5    7    2

funnames<-c(sum,mean,first)
colnums<-c(2,3,4)
ddply(.data = dataframe,.variables = "var",
    function(x,funcs,inds){
        mapply(function(func,ind){
            func(x[,ind])
        },funcs,inds)
    },funnames,colnums)

#  var V1 V2 V3
#1   1 27  3  5
#2   2 18  6  2

0人赞添加讨论(0) 举报

Using dplyr summarize with different operations fo

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间