I am trying to write a function in tidyverse/dplyr
that I want to eventually use with lapply
(or map
). (I had been working on it to answer this question, but came upon an interesting result/dead-end. Please don't mark this as a duplicate - this question is an extension/departure from the answers that you see there.)
Is there
1) a way to get a list of quoted variables to work inside a dplyr function
(and not use the deprecated SE_
functions) or is there
2) some way to feed a list of unquoted strings through an lapply
or map
I have used the Programming in Dplyr
vignette to construct what I believe is a function most in line with the current standard
for working with the NSE.
The sample data:
sample_data <-
read.table(text = "REVENUEID AMOUNT YEAR REPORT_CODE PAYMENT_METHOD INBOUND_CHANNEL AMOUNT_CAT
1 rev-24985629 30 FY18 S Check Mail 25,50
2 rev-22812413 1 FY16 Q Other Canvassing 0.01,10
3 rev-23508794 100 FY17 Q Credit_card Web 100,250
4 rev-23506121 300 FY17 S Credit_card Mail 250,500
5 rev-23550444 100 FY17 S Credit_card Web 100,250
6 rev-21508672 25 FY14 J Check Mail 25,50
7 rev-24981769 500 FY18 S Credit_card Web 500,1e+03
8 rev-23503684 50 FY17 R Check Mail 50,75
9 rev-24982087 25 FY18 R Check Mail 25,50
10 rev-24979834 50 FY18 R Credit_card Web 50,75
", header = TRUE, stringsAsFactors = FALSE)
A report generating function
report <- function(report_cat){
report_cat <- enquo(report_cat)
sample_data %>%
group_by(!!report_cat, YEAR) %>%
summarize(num=n(),total=sum(AMOUNT)) %>%
rename(REPORT_VALUE = !!report_cat) %>%
mutate(REPORT_CATEGORY := as.character(quote(!!report_cat))[2])
}
Which works fine for generating a single report:
> report(REPORT_CODE) # A tibble: 7 x 5 # Groups: REPORT_VALUE [4] REPORT_VALUE YEAR num total REPORT_CATEGORY <chr> <chr> <int> <int> <chr> 1 J FY14 1 25 REPORT_CODE 2 Q FY16 1 1 REPORT_CODE 3 Q FY17 1 100 REPORT_CODE 4 R FY17 1 50 REPORT_CODE 5 R FY18 2 75 REPORT_CODE 6 S FY17 2 400 REPORT_CODE 7 S FY18 2 530 REPORT_CODE
It is when I try and set up a list of all 4 of the reports to generate, that everything breaks down. (Though admittedly the code required in that last line of the function - to return a string with which to then fill the column - should be clue enough that I have wandered off in the wrong direction.)
#the other reports
cat.list <- c("REPORT_CODE","PAYMENT_METHOD","INBOUND_CHANNEL","AMOUNT_CAT")
# Applying and Mapping attempts
lapply(cat.list, report)
map_df(cat.list, report)
Which results in:
> lapply(cat.list, report) Error in (function (x, strict = TRUE) : the argument has already been evaluated > map_df(cat.list, report) Error in (function (x, strict = TRUE) : the argument has already been evaluated
I have also tried to convert the list of strings to names before handing it over to apply
and map
:
library(rlang)
cat.names <- lapply(cat.list, sym)
lapply(cat.names, report)
map_df(cat.names, report)
> lapply(cat.names, report) Error in (function (x, strict = TRUE) : the argument has already been evaluated > map_df(cat.names, report) Error in (function (x, strict = TRUE) : the argument has already been evaluated
In any case, the reason I am asking this question is that I think that I have written the function to the currently documented standards, but ultimately I can then see no way to utilize a member of the apply
or even of the purrr::map
family with such a function. Short of rewriting the function to use names
like useR has done here https://stackoverflow.com/a/47316151/5088194 is there a way to get this function to work with apply
or map
?
I am hoping to see this as a result:
# A tibble: 27 x 5 # Groups: REPORT_VALUE [16] REPORT_VALUE YEAR num total REPORT_CATEGORY <chr> <chr> <int> <int> <chr> 1 J FY14 1 25 REPORT_CODE 2 Q FY16 1 1 REPORT_CODE 3 Q FY17 1 100 REPORT_CODE 4 R FY17 1 50 REPORT_CODE 5 R FY18 2 75 REPORT_CODE 6 S FY17 2 400 REPORT_CODE 7 S FY18 2 530 REPORT_CODE 8 Check FY14 1 25 PAYMENT_METHOD 9 Check FY17 1 50 PAYMENT_METHOD 10 Check FY18 2 55 PAYMENT_METHOD # ... with 17 more rows
Let me first point out that in your initial
report
function, you can usequo_name
to convert the quosure into a string, which you can then use inmutate
like the following:Now, to address your question of "how to feed a list of unquoted strings through
lapply
ormap
to make it work insidedplyr
functions", I propose two ways of doing it.1. Use
rlang::sym
to parse your strings and unquote it when feeding intolapply
ormap
or with
syms
you can parse all elements of a vector at once:Result:
2. Rewrite your
report
function by placinglapply
ormap
inside so thatreport
can do NSEBy placing
map_df
insidereport
, you can take advantage ofquos
, which converts...
to list of quosures. They are then fed intomap_df
and unquoted one by one using!!
.Another advantage of writing it like this is that you can also supply a vector of string symbols and splice them using
!!!
like the following:Result:
as.name
will convert a string to a name and that can be passed toreport
:character argument An alternative is to rewrite
report
so that it accepts a character string argument:wrapr An alternate approach is to rewrite
report
using the wrapr package which is an alternative to rlang/tidyeval:Of course, this whole problem would go away if you used a different framework, e.g.
plyr
sqldf
base - by
data.table The data.table package provides another alternative but that has already been covered by another answer.
Update: Added additional alternatives.
I'm not really a dplyr afficionado, but for what its worth here is how you could achieve this using
library(data.table)
instead: