Filter data frame by character column name (in dpl

2020-01-23 09:47发布

问题:

I have a data frame and want to filter it in one of two ways, by either column "this" or column "that". I would like to be able to refer to the column name as a variable. How (in dplyr, if that makes a difference) do I refer to a column name by a variable?

library(dplyr)
df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2))
df
#   this that
# 1    1    1
# 2    2    1
# 3    2    2
df %>% filter(this == 1)
#   this that
# 1    1    1

But say I want to use the variable column to hold either "this" or "that", and filter on whatever the value of column is. Both as.symbol and get work in other contexts, but not this:

column <- "this"
df %>% filter(as.symbol(column) == 1)
# [1] this that
# <0 rows> (or 0-length row.names)
df %>% filter(get(column) == 1)
# Error in get("this") : object 'this' not found

How can I turn the value of column into a column name?

回答1:

From the current dplyr help file (emphasis by me):

dplyr used to offer twin versions of each verb suffixed with an underscore. These versions had standard evaluation (SE) semantics: rather than taking arguments by code, like NSE verbs, they took arguments by value. Their purpose was to make it possible to program with dplyr. However, dplyr now uses tidy evaluation semantics. NSE verbs still capture their arguments, but you can now unquote parts of these arguments. This offers full programmability with NSE verbs. Thus, the underscored versions are now superfluous.

So we basically need to do two things, to be able to refer to the value "this" of the variable column inside dplyr::filter():

  1. We need to turn the variable column which is of type character into type symbol.

    Using base R this can be achieved by the function as.symbol() which is an alias for as.name(). The former is preferred by the tidyverse developers because it

    follows a more modern terminology (R types instead of S modes).

    Alternatively the same can be achieved by rlang::sym() from the tidyverse.

  2. We need to unquote the symbol from 1).

    What unquoting exactly means can be learned in the vignette Programming with dplyr. It is achieved by the function UQ() or as syntactic sugar by !!. Now there are situations – like yours – where only the former correctly works because !! can collide with the single !.

Applied to your example:

library(dplyr)
df <- data.frame(this = c(1, 2, 2),
                 that = c(1, 1, 2))
column <- "this"

df %>% filter(UQ(as.symbol(column)) == 1)
#   this that
# 1    1    1

But not:

df %>% filter(!!as.symbol(column) == 1)
# [1] this that
# <0 Zeilen> (oder row.names mit Länge 0)

The syntactic sugar !! does work again as supposed if you either add some extra round brackets (thanks to Martijn vd Voort for the suggestion):

df %>% filter((!!as.symbol(column)) == 1)
#   this that
# 1    1    1

Or if you just interchange the two comparison operands (thanks to carand for the hint):

df %>% filter(1 == !!as.symbol(column))
#   this that
# 1    1    1


回答2:

I would steer clear of using get() all together. It seems like it would be quite dangerous in this situation, especially if you're programming. You could use either an unevaluated call or a pasted character string, but you'll need to use filter_() instead of filter().

df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2))
column <- "this"

Option 1 - using an unevaluated call:

You can hard-code y as 1, but here I show it as y to illustrate how you can change the expression values easily.

expr <- lazyeval::interp(quote(x == y), x = as.name(column), y = 1)
## or 
## expr <- substitute(x == y, list(x = as.name(column), y = 1))
df %>% filter_(expr)
#   this that
# 1    1    1

Option 2 - using paste() (and obviously easier):

df %>% filter_(paste(column, "==", 1))
#   this that
# 1    1    1

The main thing about these two options is that we need to use filter_() instead of filter(). In fact, from what I've read, if you're programming with dplyr you should always use the *_() functions.

I used this post as a helpful reference: character string as function argument r, and I'm using dplyr version 0.3.0.2.



回答3:

Regarding Richard's solution, just want to add that if you the column is character. You can add shQuote to filter by character values.

For example, you can use

df %>% filter_(paste(column, "==", shQuote("a")))

If you have multiple filters, you can specify collapse = "&" in paste.

df %>$ filter_(paste(c("column1","column2"), "==", shQuote(c("a","b")), collapse = "&"))


回答4:

Here's another solution for the latest dplyr version:

df <- data.frame(this = c(1, 2, 2),
                 that = c(1, 1, 2))
column <- "this"

df %>% filter(.[[column]] == 1)

#  this that
#1    1    1


回答5:

Or using filter_at

library(dplyr)
df %>% 
   filter_at(vars(column), any_vars(. == 1))


回答6:

Like Salim B explained above but with a minor change:

df %>% filter(1 == !!as.name(column))

i.e. just reverse the condition because !! otherwise behaves like

!!(as.name(column)==1)


标签: r dplyr