How to filter a column by multiple, flexible crite

2019-09-08 14:50发布

问题:

I'm writing a function to aggregate a dataframe, and it needs to be generally applicable to a wide variety of datasets. One step in this function is dplyr's filter function, used to select from the data only the ad campaign types relevant to the task at hand. Since I need the function to be flexible, I want ad_campaign_types as an input, but this makes filtering kind of hairy, as so:

aggregate_data <- function(ad_campaign_types) {
  raw_data %>%
    filter(ad_campaign_type == ad_campaign_types) -> agg_data
  agg_data
}
new_data <- aggregate_data(ad_campaign_types = c("campaign_A", "campaign_B", "campaign_C"))

I would think the above would work, but while it runs, oddly enough it only returns only a small fraction of what the filtered dataset should be. Is there a better way to do this?

Another tiny example of replaceable code:

ad_types <- c("a", "a", "a", "b", "b", "c", "c", "c", "d", "d")
revenue <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
data <- as.data.frame(cbind(ad_types, revenue))

# Now, filtering to select only ad types "a", "b", and "d",
# which should leave us with only 7 values
new_data <- filter(data, ad_types == c("a", "b", "d"))
nrow(new_data)
[1] 3

回答1:

For multiple criteria use %in% function:

filter(data, ad_types %in% c("a", "b", "d"))

you can also use "not in" criterion:

filter(data, !(ad_types %in% c("a", "b", "d")))

However notice that %in%'s behavior is a little bit different than ==:

> c(2, NA) == 2
[1] TRUE   NA
> c(2, NA) %in% 2
[1]  TRUE FALSE

some find one of those more intuitive than other, but you have to remember about the difference.

As for using multiple different criteria simply use chains of criteria with and/or statements:

filter(mtcars, cyl > 2 & wt < 2.5 & gear == 4)


回答2:

Tim is correct for filtering a dataframe. However, if you want to make a function with dplyr, you need to follow the instructions at this webpage: https://rpubs.com/hadley/dplyr-programming.

The code I would suggest.

library(tidyverse)
ad_types <- c("a", "a", "a", "b", "b", "c", "c", "c", "d", "d")
revenue <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
df <- data_frame(ad_types = as.factor(ad_types), revenue = revenue)

aggregate_data <- function(df, ad_types, my_list) {
      ad_types = enquo(ad_types) # Make ad_types a quosure
      df %>%
          filter(UQ(ad_types) %in% my_list) # Unquosure
}

new_data <- aggregate_data(df = df, ad_types = ad_types, 
                           my_list = c("a", "b", "c"))

That should work!



标签: r function dplyr