I'm working on building a function that I will manipulate a data frame based on a string. Within the function, I'll build a column name as from the string and use it to manipulate the data frame, something like this:
library(dplyr)
orig_df <- data_frame(
id = 1:3
, amt = c(100, 200, 300)
, anyA = c(T,F,T)
, othercol = c(F,F,T)
)
summarize_my_df_broken <- function(df, my_string) {
my_column <- quo(paste0("any", my_string))
df %>%
filter(!!my_column) %>%
group_by(othercol) %>%
summarize(
n = n()
, total = sum(amt)
) %>%
# I need the original string as new column which is why I can't
# pass in just the column name
mutate(stringid = my_string)
}
summarize_my_df_works <- function(df, my_string) {
my_column <- quo(paste0("any", my_string))
df %>%
group_by(!!my_column, othercol) %>%
summarize(
n = n()
, total = sum(amt)
) %>%
mutate(stringid = my_string)
}
# throws an error:
# Argument 2 filter condition does not evaluate to a logical vector
summarize_my_df_broken(orig_df, "A")
# works just fine
summarize_my_df_works(orig_df, "A")
I understand what the problem is: unquoting the quosure as an argument to filter()
in the broken version is not referencing the actual column anyA.
What I don't understand is why it works in summarize()
, but not in filter()
--why is there a difference?
Right now you are are making quosures of strings, not symbol names. That's not how those are supposed to be used. There's a big difference between quo("hello")
and quo(hello)
. If you want to make a proper symbol name from a string, you need to use rlang::sym
. So a quick fix would be
summarize_my_df_broken <- function(df, my_string) {
my_column <- rlang::sym(paste0("any", my_string))
...
}
If you look more closely I think you'll see the group_by/summarize
isn't actually working the way you expect either (though you just don't get the same error message). These two do not produce the same results
summarize_my_df_works(orig_df, "A")
# `paste0("any", my_string)` othercol n total
# <chr> <lgl> <int> <dbl>
# 1 anyA FALSE 2 300
# 2 anyA TRUE 1 300
orig_df %>%
group_by(anyA, othercol) %>%
summarize(
n = n()
, total = sum(amt)
) %>%
mutate(stringid = "A")
# anyA othercol n total stringid
# <lgl> <lgl> <int> <dbl> <chr>
# 1 FALSE FALSE 1 200 A
# 2 TRUE FALSE 1 100 A
# 3 TRUE TRUE 1 300 A
Again the problem is using a string instead of a symbol.
You don't have any conditions for filter()
in your 'broken' function, you just specify the column name.
Beyond that, I'm not sure if you can insert quosures into larger expressions. For example, here you might try something like:
df %>% filter((!!my_column) == TRUE)
But I don't think that would work.
Instead, I would suggest using the conditional function filter_at()
to target the appropriate column. In that case, you separate the quosure from the filter condition:
summarize_my_df_broken <- function(df, my_string) {
my_column <- quo(paste0("any", my_string))
df %>%
filter_at(vars(!!my_column), all_vars(. == TRUE)) %>%
group_by(othercol) %>%
summarize(
n = n()
, total = sum(amt)
) %>%
mutate(stringid = my_string)
}