可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am trying to create a new column that will contain a result of calculations done rowwise over a subset of columns of a tibble, and add this new column to the existing tibble. Like so:
df <- tibble(
ID = c("one", "two", "three"),
A1 = c(1, 1, 1),
A2 = c(2, 2, 2),
A3 = c(3, 3, 3)
)
I effectively want to do a dplyr equivalent of this code from base R:
df$SumA <- rowSums(df[,grepl("^A", colnames(df))])
My problem is that this doesn't work:
df %>%
select(starts_with("A")) %>%
mutate(SumA = rowSums(.))
# some code here
...because I got rid of the "ID" column in order to let mutate run the rowSums over the other (numerical) columns. I have tried to cbind or bind_cols in the pipe after the mutate, but it doesn't work. None of the variants of mutate work, because they work in-place (within each cell of the tibble, and not across the columns, even with rowwise).
This does work, but doesn't strike me as an elegant solution:
df %>%
mutate(SumA = rowSums(.[,grepl("^A", colnames(df))]))
Is there any tidyverse-based solution that does not require grepl or square brackets but only more standard dplyr verbs and parameters?
My expected output is this:
df_out <- tibble(
ID = c("one", "two", "three"),
A1 = c(1, 1, 1),
A2 = c(2, 2, 2),
A3 = c(3, 3, 3),
SumA = c(6, 6, 6)
)
Best
kJ
回答1:
Here's one way to approach row-wise computation in the tidyverse
using purrr::pmap
. This is best used with functions that actually need to be run row by row; simple addition could probably be done a faster way. Basically we use select
to provide the input list to pmap
, which lets us use the select
helpers such as starts_with
or matches
if you need regex.
library(tidyverse)
df <- tibble(
ID = c("one", "two", "three"),
A1 = c(1, 1, 1),
A2 = c(2, 2, 2),
A3 = c(3, 3, 3)
)
df %>%
mutate(
SumA = pmap_dbl(
.l = select(., starts_with("A")),
.f = function(...) sum(...)
)
)
#> # A tibble: 3 x 5
#> ID A1 A2 A3 SumA
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 one 1 2 3 6
#> 2 two 1 2 3 6
#> 3 three 1 2 3 6
Created on 2019-01-30 by the reprex package (v0.2.1)
回答2:
Here's a different approach that doesn't move rowwise but instead exploits the vectorised nature of addition and that addition commutes. That lets use repeatedly apply +
with purrr::reduce
library(tidyverse)
df <- tibble(
ID = c("one", "two", "three"),
A1 = c(1, 1, 1),
A2 = c(2, 2, 2),
A3 = c(3, 3, 3)
)
df %>%
mutate(
SumA = reduce(
.x = select(., starts_with("A")),
.f = `+`
)
)
#> # A tibble: 3 x 5
#> ID A1 A2 A3 SumA
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 one 1 2 3 6
#> 2 two 1 2 3 6
#> 3 three 1 2 3 6
Created on 2019-01-30 by the reprex package (v0.2.1)
回答3:
1) To do it with rowSums
try nesting a second pipeline in the mutate
like this:
library(dplyr)
df %>% mutate(Sum = select(., starts_with("A")) %>% rowSums)
giving:
# A tibble: 3 x 5
ID A1 A2 A3 Sum
<chr> <dbl> <dbl> <dbl> <dbl>
1 one 1 2 3 6
2 two 1 2 3 6
3 three 1 2 3 6
2) An alternative is to reshape it to long form and then summarize:
library(dplyr)
library(purrr)
library(tidyr)
df %>%
mutate(Sum = gather(., key, value, -ID) %>%
group_by(., ID) %>%
summarize(sum = sum(value)) %>%
ungroup %>%
pull(sum))
giving:
# A tibble: 3 x 5
ID A1 A2 A3 Sum
<chr> <dbl> <dbl> <dbl> <dbl>
1 one 1 2 3 6
2 two 1 2 3 6
3 three 1 2 3 6
回答4:
[upd] I didn't notice that @Calum used a nearly the same approach.
Another possible way to do that:
library(dplyr)
library(purrr)
dat %>%
mutate(SumA = pmap_dbl(select(., contains('A')), sum))
Data:
# dat <- tibble(
# ID = c("one", "two", "three"),
# A1 = c(1, 1, 1),
# A2 = c(2, 2, 2),
# A3 = c(3, 3, 3)
# )
Output:
# # A tibble: 3 x 5
# ID A1 A2 A3 SumA
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 one 1 2 3 6
# 2 two 1 2 3 6
# 3 three 1 2 3 6
回答5:
You could nest and use rowSums
on the nested columns :
library(tidyverse)
df %>% nest(-ID) %>%
mutate(SumA = map_dbl(data,rowSums)) %>%
unnest
# # A tibble: 3 x 5
# ID SumA A1 A2 A3
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 one 6 1 2 3
# 2 two 6 1 2 3
# 3 three 6 1 2 3
Or this variant on the pmap
approach :
df %>% mutate(SumA = pmap_dbl(.[-1],sum))
# # A tibble: 3 x 5
# ID A1 A2 A3 SumA
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 one 1 2 3 6
# 2 two 1 2 3 6
# 3 three 1 2 3 6
And to show that base is sometimes easier :
df$SumA <- rowSums(df[-1])