可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
There are a couple of issues about this on the dplyr Github repo already, and at least one related SO question, but none of them quite covers my question -- I think.
- Adding multiple columns in a dplyr mutate call is more or less what I want, but there's a special-case answer for that case (
tidyr::separate
) that doesn't (I think) work for me.
- This issue ("summarise or mutate with functions returning multiple values/columns") says "use
do()
".
Here's my use case: I want to compute exact binomial confidence intervals
dd <- data.frame(x=c(3,4),n=c(10,11))
get_binCI <- function(x,n) {
rbind(setNames(c(binom.test(x,n)$conf.int),c("lwr","upr")))
}
with(dd[1,],get_binCI(x,n))
## lwr upr
## [1,] 0.06673951 0.6524529
I can get this done with do()
but I wonder if there's a more expressive way to do this (it feels like mutate()
could have a .n
argument as is being discussed for summarise() ...)
library("dplyr")
dd %>% group_by(x,n) %>%
do(cbind(.,get_binCI(.$x,.$n)))
## Source: local data frame [2 x 4]
## Groups: x, n
##
## x n lwr upr
## 1 3 10 0.06673951 0.6524529
## 2 4 11 0.10926344 0.6920953
回答1:
Yet another variant, although I think we're all splitting hairs here.
> dd <- data.frame(x=c(3,4),n=c(10,11))
> get_binCI <- function(x,n) {
+ as_data_frame(setNames(as.list(binom.test(x,n)$conf.int),c("lwr","upr")))
+ }
>
> dd %>%
+ group_by(x,n) %>%
+ do(get_binCI(.$x,.$n))
Source: local data frame [2 x 4]
Groups: x, n
x n lwr upr
1 3 10 0.06673951 0.6524529
2 4 11 0.10926344 0.6920953
Personally, if we're just going by readability, I find this preferable:
foo <- function(x,n){
bi <- binom.test(x,n)$conf.int
data_frame(lwr = bi[1],
upr = bi[2])
}
dd %>%
group_by(x,n) %>%
do(foo(.$x,.$n))
...but now we're really splitting hairs.
回答2:
Yet another option could be to use the purrr::map
family of functions.
If you replace rbind
with dplyr::bind_rows
in the get_binCI
function:
library(tidyverse)
dd <- data.frame(x = c(3, 4), n = c(10, 11))
get_binCI <- function(x, n) {
bind_rows(setNames(c(binom.test(x, n)$conf.int), c("lwr", "upr")))
}
You can use purrr::map2
with tidyr::unnest
:
dd %>% mutate(result = map2(x, n, get_binCI)) %>% unnest()
#> x n lwr upr
#> 1 3 10 0.06673951 0.6524529
#> 2 4 11 0.10926344 0.6920953
Or purrr::map2_dfr
with dplyr::bind_cols
:
dd %>% bind_cols(map2_dfr(.$x, .$n, get_binCI))
#> x n lwr upr
#> 1 3 10 0.06673951 0.6524529
#> 2 4 11 0.10926344 0.6920953
回答3:
Here's a quick solution using data.table
package instead
First, a little change to the function
get_binCI <- function(x,n) as.list(setNames(binom.test(x,n)$conf.int, c("lwr", "upr")))
Then, simply
library(data.table)
setDT(dd)[, get_binCI(x, n), by = .(x, n)]
# x n lwr upr
# 1: 3 10 0.06673951 0.6524529
# 2: 4 11 0.10926344 0.6920953
回答4:
This uses a "standard" dplyr workflow, but as @BenBolker notes in the comments, it requires calling get_binCI
twice:
dd %>% group_by(x,n) %>%
mutate(lwr=get_binCI(x,n)[1],
upr=get_binCI(x,n)[2])
x n lwr upr
1 3 10 0.06673951 0.6524529
2 4 11 0.10926344 0.6920953
回答5:
Here are some possibilities with rowwise
and nesting
.
library("dplyr")
library("tidyr")
data frame with repeated x/n combinations, for fun
dd <- data.frame(x=c(3, 4, 3), n=c(10, 11, 10))
a versions of the CI function that returns a data frame, like @Joran's
get_binCI_df <- function(x,n) {
binom.test(x, n)$conf.int %>%
setNames(c("lwr", "upr")) %>%
as.list() %>% as.data.frame()
}
Grouping by x
and n
as before, removes the duplicate.
dd %>% group_by(x,n) %>% do(get_binCI_df(.$x,.$n))
# # A tibble: 2 x 4
# # Groups: x, n [2]
# x n lwr upr
# <dbl> <dbl> <dbl> <dbl>
# 1 3 10 0.1181172 0.8818828
# 2 4 11 0.1092634 0.6920953
Using rowwise
keeps all the rows but removes x
and n
unless you put them back using cbind(.
(like Ben does in his OP).
dd %>% rowwise() %>% do(cbind(., get_binCI_df(.$x,.$n)))
# Source: local data frame [3 x 4]
# Groups: <by row>
#
# # A tibble: 3 x 4
# x n lwr upr
# * <dbl> <dbl> <dbl> <dbl>
# 1 3 10 0.06673951 0.6524529
# 2 4 11 0.10926344 0.6920953
# 3 3 10 0.06673951 0.6524529
It feels like nesting could work more cleanly, but this is as good as I can get. Using mutate
means I can use x
and n
directly instead of .$x
and .$n
, but mutate expects a single value, so it needs to be wrapped in list
.
dd %>% rowwise() %>% mutate(ci=list(get_binCI_df(x, n))) %>% unnest()
# # A tibble: 3 x 4
# x n lwr upr
# <dbl> <dbl> <dbl> <dbl>
# 1 3 10 0.06673951 0.6524529
# 2 4 11 0.10926344 0.6920953
# 3 3 10 0.06673951 0.6524529
Finally, looks like something like this is an open issue (as of 5 Oct 2017) for dplyr; see https://github.com/tidyverse/dplyr/issues/2326; if something like that is implemented then that will be the easiest way!