Sum multiple variables by group and create new col

2020-03-31 04:37发布

I have a data frame with grouped variable and I want to sum them by group. It's easy with dplyr.

library(dplyr)
library(magrittr)

data <- data.frame(group = c("a", "a", "b", "c", "c"), n1 = 1:5, n2 = 2:6)

data %>% group_by(group) %>%
  summarise_all(sum)

# A tibble: 3 x 3
   group    n1    n2
  <fctr> <int> <int>
1      a     3     5
2      b     3     4
3      c     9    11

But now I want a new column total with the sum of n1 and n2 by group. Like this:

# A tibble: 3 x 3
   group    n1    n2   ttl
  <fctr> <int> <int> <int>
1      a     3     5     8
2      b     3     4     7
3      c     9    11    20

How can I do that with dplyr?

EDIT: Actually, it's just an example, I have a lot of variables.

I tried these two codes but it's not in the right dimension...

data %>% group_by(group) %>%
  summarise_all(sum) %>%
  summarise_if(is.numeric, sum)

data %>% group_by(group) %>%
  summarise_all(sum) %>%
  mutate_if(is.numeric, .funs = sum)

4条回答
The star\"
2楼-- · 2020-03-31 04:59

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(data)), grouped by 'group', get the sum of each columns in the Subset of data.table, and then with Reduce, get the sum of the rows of the columns of interest

library(data.table)
setDT(data)[, lapply(.SD, sum) , group][, tt1 := Reduce(`+`, .SD),  
                       .SDcols = names(data)[-1]][]
#   group n1 n2 tt1
#1:     a  3  5   8
#2:     b  3  4   7
#3:     c  9 11  20

Or with base R

addmargins(as.matrix(rowsum(data[-1], data$group)), 2)
#  n1 n2 Sum
#a  3  5   8
#b  3  4   7
#c  9 11  20

Or with dplyr

data %>%
   group_by(group) %>%
   summarise_all(sum) %>% 
   mutate(tt = rowSums(.[-1]))
查看更多
劳资没心,怎么记你
3楼-- · 2020-03-31 05:12

Base R

cbind(aggregate(.~group, data, sum), ttl = sapply(split(data[,-1], data$group), sum))
#  group n1 n2 ttl
#a     a  3  5   8
#b     b  3  4   7
#c     c  9 11  20
查看更多
爱情/是我丢掉的垃圾
4楼-- · 2020-03-31 05:23

We can use apply together with the dplyr functions.

data <- data.frame(group = c("a", "a", "b", "c", "c"), n1 = 1:5, n2 = 2:6)

data %>% group_by(group) %>%
  summarise_all(sum) %>%
  mutate(ttl = apply(.[, 2:ncol(.)], 1, sum))

# A tibble: 3 × 4
   group    n1    n2   ttl
  <fctr> <int> <int> <int>
1      a     3     5     8
2      b     3     4     7
3      c     9    11    20

Or rowSums with the same strategy. The key is to use . to specify the data frame and [] with x:ncol(.) to keep the columns you want.

data %>% group_by(group) %>%
  summarise_all(sum) %>%
  mutate(ttl = rowSums(.[, 2:ncol(.)]))

# A tibble: 3 × 4
   group    n1    n2   ttl
  <fctr> <int> <int> <dbl>
1      a     3     5     8
2      b     3     4     7
3      c     9    11    20
查看更多
小情绪 Triste *
5楼-- · 2020-03-31 05:24

You can use mutate after summarize:

data %>% 
    group_by(group) %>%
    summarise_all(sum) %>% 
    mutate(tt1 = n1 + n2)

# A tibble: 3 x 4
#   group    n1    n2   tt1
#  <fctr> <int> <int> <int>
#1      a     3     5     8
#2      b     3     4     7
#3      c     9    11    20

If need to sum all numeric columns, you can use rowSums with select_if (to select numeric columns) to sum columns up:

data %>% 
    group_by(group) %>%
    summarise_all(sum) %>% 
    mutate(tt1 = rowSums(select_if(., is.numeric)))

# A tibble: 3 x 4
#   group    n1    n2   tt1
#  <fctr> <int> <int> <dbl>
#1      a     3     5     8
#2      b     3     4     7
#3      c     9    11    20
查看更多
登录 后发表回答