mutate and rowSums exclude columns

2020-04-08 15:28发布

问题:

Similar to: mutate rowSums exclude one column but in my case, I really want to be able to use select to remove a specific column or set of columns

I'm trying to understand why something of this nature, won't work.

d <- data.frame(
   Alpha = letters[1:26], 
   Beta = rnorm(26),
   Epsilon = rnorm(26),
   Gamma = rnorm(26)
)

I thought this would work, but it's giving me a strange error:

# Total = Beta + Gamma
d <- mutate(d,Total = rowSums(select(d,-Epsilon,-Alpha)))

Error: All select() inputs must resolve to integer column positions.
The following do not:
*  -structure(1:26, .Label = c("a", "b", "c", "d", "e", "f", "g", "h", "i...
In addition: Warning message:
In Ops.factor(1:26) : ‘-’ not meaningful for factors

I'd like to be able to do this in a long chain, and keep it "dplyr style"... it strikes me as odd that this is so difficult given that it's really straightforward without using typical dplyr syntax:

d$Total <- rowSums(select(d, -Alpha, -Epsilon)) # This works! 

回答1:

I'm only just learning dplyr, so perhaps it is because of version upgrades, but this does now work:

d %>% mutate(Total=rowSums(select(d,-Epsilon, -Alpha)))

These days, I usually see folks use the dot notation:

d %>% mutate(Total=rowSums(select(.,-Epsilon, -Alpha)))

A slightly more manageable example:

df2 = data.frame(A=sample(0:20,10), B=sample(0:20, 10), C=sample(0:20,10), D=LETTERS[1:10])
df2
    A  B  C D
1  19  0  9 A
2   6 10 14 B
3  13 20  6 C
4  20  4 15 D
5   9 14  8 E
6  11  1 18 F
7   4 15 13 G
8  17  5  0 H
9  16  3 16 I
10  2  6  1 J
df2 %>% mutate(total=rowSums(select(.,-D)))
    A  B  C D total
1  19  0  9 A    28
2   6 10 14 B    30
3  13 20  6 C    39
4  20  4 15 D    39
5   9 14  8 E    31
6  11  1 18 F    30
7   4 15 13 G    32
8  17  5  0 H    22
9  16  3 16 I    35
10  2  6  1 J     9

NOTE:
The question you linked to has an updated answer that shows yet another new method that demonstrates some new dplyr features:

df2 %>% mutate(total=rowSums(select_if(., is.numeric)))
    A  B  C D total
1  19  0  9 A    28
2   6 10 14 B    30
3  13 20  6 C    39
4  20  4 15 D    39
5   9 14  8 E    31
6  11  1 18 F    30
7   4 15 13 G    32
8  17  5  0 H    22
9  16  3 16 I    35
10  2  6  1 J     9


回答2:

@akrun provided already a relevant link about this problem. As about dplyr solution, I would actually use do:

d %>%
  do({
    .$Total <- rowSums(select(., -Epsilon, -Alpha))
    .
  })


标签: r dplyr