I'm writing a function to aggregate a dataframe, and it needs to be generally applicable to a wide variety of datasets. One step in this function is dplyr's filter
function, used to select from the data only the ad campaign types relevant to the task at hand. Since I need the function to be flexible, I want ad_campaign_types as an input, but this makes filtering kind of hairy, as so:
aggregate_data <- function(ad_campaign_types) {
raw_data %>%
filter(ad_campaign_type == ad_campaign_types) -> agg_data
agg_data
}
new_data <- aggregate_data(ad_campaign_types = c("campaign_A", "campaign_B", "campaign_C"))
I would think the above would work, but while it runs, oddly enough it only returns only a small fraction of what the filtered dataset should be. Is there a better way to do this?
Another tiny example of replaceable code:
ad_types <- c("a", "a", "a", "b", "b", "c", "c", "c", "d", "d")
revenue <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
data <- as.data.frame(cbind(ad_types, revenue))
# Now, filtering to select only ad types "a", "b", and "d",
# which should leave us with only 7 values
new_data <- filter(data, ad_types == c("a", "b", "d"))
nrow(new_data)
[1] 3
For multiple criteria use %in%
function:
filter(data, ad_types %in% c("a", "b", "d"))
you can also use "not in" criterion:
filter(data, !(ad_types %in% c("a", "b", "d")))
However notice that %in%
's behavior is a little bit different than ==
:
> c(2, NA) == 2
[1] TRUE NA
> c(2, NA) %in% 2
[1] TRUE FALSE
some find one of those more intuitive than other, but you have to remember about the difference.
As for using multiple different criteria simply use chains of criteria with and/or statements:
filter(mtcars, cyl > 2 & wt < 2.5 & gear == 4)
Tim is correct for filtering a dataframe. However, if you want to make a function with dplyr, you need to follow the instructions at this webpage: https://rpubs.com/hadley/dplyr-programming.
The code I would suggest.
library(tidyverse)
ad_types <- c("a", "a", "a", "b", "b", "c", "c", "c", "d", "d")
revenue <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
df <- data_frame(ad_types = as.factor(ad_types), revenue = revenue)
aggregate_data <- function(df, ad_types, my_list) {
ad_types = enquo(ad_types) # Make ad_types a quosure
df %>%
filter(UQ(ad_types) %in% my_list) # Unquosure
}
new_data <- aggregate_data(df = df, ad_types = ad_types,
my_list = c("a", "b", "c"))
That should work!