I am wrestling with programming using dplyr
in R to operate on columns of a data frame that are only known by their string names. I know there was recently an update to dplyr
to support quosures and the like and I've reviewed what I think are the relevant components of the new "Programming with dplyr" article here: http://dplyr.tidyverse.org/articles/programming.html. However, I'm still not able to do what I want.
My situation is that I know a column name of a data frame only by its string name. Thus, I can't use non-standard evaluation in a call to dplyr
within a function or even a script where the column name may change between runs because I can't hard-code the unquoted (i.e., "bare") column name generally. I'm wondering how to get around this, and I'm guessing I'm overlooking something with the new quoting/unquoting syntax.
For example, suppose I have user inputs that define cutoff percentiles for a distribution of data. A user may run the code using any percentile he/she would like, and the percentile he/she picks will change the output. Within the analysis, a column in an intermediate data frame is created with the name of the percentile that is used; thus this column's name changes depending on the cutoff percentile input by the user.
Below is a minimal example to illustrate. I want to call the function with various values for the cutoff percentile. I want the data frame named MPGCutoffs
to have a column that is named according to the chosen cutoff quantile (this currently works in the below code), and I want to later operate on this column name. Because of the generality of this column name, I can only know it in terms of the input pctCutoff
at the time of writing the function, so I need a way to operate on it when only knowing the string defined by probColName
, which follows a predefined pattern based on the value of pctCutoff
.
userInput_prob1 <- 0.95
userInput_prob2 <- 0.9
# Function to get cars that have the "best" MPG
# fuel economy, where "best" is defined by the
# percentile cutoff passed to the function.
getBestMPG <- function( pctCutoff ){
# Define new column name to hold the MPG percentile cutoff.
probColName <- paste0('P', pctCutoff*100)
# Compute the MPG percentile cutoff by number of gears.
MPGCutoffs <- mtcars %>%
dplyr::group_by( gear ) %>%
dplyr::summarize( !!probColName := quantile(mpg, pctCutoff) )
# Filter mtcars with only MPG values above cutoffs.
output <- mtcars %>%
dplyr::left_join( MPGCutoffs, by='gear' ) %>%
dplyr::filter( mpg > !!probColName ) #****This doesn't run; this is where I'm stuck
# Return filtered data.
return(output)
}
best_1 <- getBestMPG( userInput_prob1 )
best_2 <- getBestMPG( userInput_prob2 )
The dplyr::filter()
statement is what I can't get to run properly. I've tried:
dplyr::filter( mpg > probColName )
- No error, but no rows returned.
dplyr::filter( mpg > !!probColName )
- No error, but no rows returned.
I've also seen examples where I could pass something like quo(P95)
to the function and then unquote it in the call to dplyr::filter()
; I've gotten this to work, but it doesn't solve my problem since it requires hard-coding the variable name outside the function. For example, if I do this and the percentile passed by the user is 0.90, then the call to dplyr::filter()
fails because the column created is named P90
and not P95
.
Any help would be greatly appreciated. I'm hoping there's an easy solution that I'm just overlooking.
Here's an alternate solution from Hadley's comment in the post referred to in MrFlick's answer (https://github.com/tidyverse/rlang/issues/116). Using
as.name()
from base R takes the place ofrlang::sym()
, and you still do need to unquote it. That is, the following also works:dplyr::filter( mpg > !!as.name(probColName) )
If you have a column name in a string (aka character vector) and you want to use it with tidyeval, then you can covert it with
rlang::sym()
. Just changeand it should work. This is taken from the recommendation at this github issue: https://github.com/tidyverse/rlang/issues/116
It's still fine to use
because when dynamically setting a parameter name, you just need the string and not an unqouted symbol.