I am new to tidyverse, conceptually I would like to calculate mean and 90% CI of all columns starts with "ab", grouped by "case". Tried many ways but none seem to work, my actual data has many columns so explicitly list them out is not an option.
test data below
library(tidyverse)
dat <- tibble(case= c("case1", "case1", "case2", "case2", "case3"),
abc = c(1, 2, 3, 1, 2),
abe = c(1, 3, 2, 3, 4),
bca = c(1, 6, 3, 8, 9))
below code is what i would like to do conceptually, but doesn't work, obviously
dat %>% group_by(`case`) %>%
summarise(mean=mean(select(starts_with("ab"))),
qt=quantile(select(starts_with("ab"), prob=c(0.05, 0.95))))
what I would like to get is something like below
case abc_mean abe_mean abc_lb abc_ub abe_lb abe_ub
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 case1 1.5 2.0 1.05 1.95 1.10 2.90
2 case2 2.0 2.5 1.10 2.90 2.05 2.95
3 case3 2.0 4.0 2.00 2.00 4.00 4.00
You were very close, just move that select
before the summarise
. We then use summarise_all
, and specify the appropriate functions within funs
.
dat %>%
group_by(case) %>%
select(starts_with('ab')) %>%
summarise_all(funs('mean' = mean, 'ub' = quantile(., .95), 'lb' = quantile(., .05)))
# # A tibble: 3 x 7
# case abc_mean abe_mean abc_ub abe_ub abc_lb abe_lb
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 case1 1.5 2.0 1.95 2.90 1.05 1.10
# 2 case2 2.0 2.5 2.90 2.95 1.10 2.05
# 3 case3 2.0 4.0 2.00 4.00 2.00 4.00
We use summarise_all
instead of summarise
because we wish to perform the same operations on multiple columns. It requires far less typing to use summarise_all
instead of a summarise
call in which we specify each column and each operation separately.
Another option is summarise_at
. vars(starts_with("ab"))
is for selecting columns, funs(...)
is to apply summarzing functions.
library(tidyverse)
dat2 <- dat %>%
group_by(case) %>%
summarise_at(vars(starts_with("ab")), funs(mean = mean(.),
lb = quantile(., prob = 0.05),
ub = quantile(., prob = 0.95)))
dat2
# # A tibble: 3 x 7
# case abc_mean abe_mean abc_lb abe_lb abc_ub abe_ub
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 case1 1.5 2.0 1.05 1.10 1.95 2.90
# 2 case2 2.0 2.5 1.10 2.05 2.90 2.95
# 3 case3 2.0 4.0 2.00 4.00 2.00 4.00